DataFrames
You may be familiar with a program like Microsoft Excel or Google Sheets that displays information as a collection of rows and columns with labels. R has a similar object called a DataFrame. We can both create DataFrames on our own or load files containing the data we would like to have as a DataFrame.
Let’s consider the example of the top five languages for mathematics and data according to Stack Overflow. First, we create two vectors containing the language names and the percentage popularity values.
languages <- c("Python", "SQL", "Java", "C++", "JavaScript")
popularity <- c(56, 39, 34, 34, 29)
Now, we can create and display the dataframe. This is straightforward and we simply name this df
and assign the vectors to the data.frame
function.
df <- data.frame(languages, popularity)
df
## languages popularity
## 1 Python 56
## 2 SQL 39
## 3 Java 34
## 4 C++ 34
## 5 JavaScript 29
We can use the kable
function from the knitr
library to display the table so it looks a little nicer.
library(knitr)
kable(df)
languages | popularity |
---|---|
Python | 56 |
SQL | 39 |
Java | 34 |
C++ | 34 |
JavaScript | 29 |
Finally, we can refer to the individual columns of the dataframe using the column name. For example, the popularity column will be referred to as df$popularity
. Below, this is used to create a new barplot that includes these references, a reversed Blues
palette from RColorBrewer, and an appropriate title for the plot. The code is suppressed in the actual document so that all we see is the plot. We do this with the {r, echo = FALSE}
line in the preface of the code block.