DataFrames

You may be familiar with a program like Microsoft Excel or Google Sheets that displays information as a collection of rows and columns with labels. R has a similar object called a DataFrame. We can both create DataFrames on our own or load files containing the data we would like to have as a DataFrame.

Let’s consider the example of the top five languages for mathematics and data according to Stack Overflow. First, we create two vectors containing the language names and the percentage popularity values.

languages  <-  c("Python", "SQL", "Java", "C++", "JavaScript")
popularity <-  c(56, 39, 34, 34, 29)

Now, we can create and display the dataframe. This is straightforward and we simply name this df and assign the vectors to the data.frame function.

df <- data.frame(languages, popularity)
df
##    languages popularity
## 1     Python         56
## 2        SQL         39
## 3       Java         34
## 4        C++         34
## 5 JavaScript         29

We can use the kable function from the knitr library to display the table so it looks a little nicer.

library(knitr)
kable(df)
languages popularity
Python 56
SQL 39
Java 34
C++ 34
JavaScript 29

Finally, we can refer to the individual columns of the dataframe using the column name. For example, the popularity column will be referred to as df$popularity. Below, this is used to create a new barplot that includes these references, a reversed Blues palette from RColorBrewer, and an appropriate title for the plot. The code is suppressed in the actual document so that all we see is the plot. We do this with the {r, echo = FALSE} line in the preface of the code block.

Further Considerations with Visualizations and Color