Chapter 6 Working with Data Frames
Most of the datasets we will encounter are objects called data frames in R.
6.1 Creating a Data Frame
Suppose we have the following dataset.
Student Name | Age | Gender | GPA | |
---|---|---|---|---|
Amy | 27 | F | 3.26 | |
Bob | 55 | M | 3.75 | |
Chuck | 34 | M | 2.98 | |
Daisy | 42 | F | 3.40 | |
Ellie | 20 | F | 2.75 | |
Frank | 27 | M | 3.32 | |
George | 34 | M | 3.68 | |
Helen | 42 | F | 3.97 |
Let create a data frame using the dataset given.
Student_Name <- c("Amy", "Bob", "Chuck", "Daisy", "Ellie", "Frank",
"George", "Helen")
Age <- c(27, 55, 34, 42, 20, 27, 34, 42)
Gender <- c("F", "M", "M", "F", "F", "M", "M", "F")
GPA <- c(3.26, 3.75, 2.98, 3.40, 2.75, 3.32, 3.68, 3.97)
nsc <- data.frame(Student_Name, Age, Gender, GPA) # Naming the data frame
nsc # Generates the data frame
## Student_Name Age Gender GPA
## 1 Amy 27 F 3.26
## 2 Bob 55 M 3.75
## 3 Chuck 34 M 2.98
## 4 Daisy 42 F 3.40
## 5 Ellie 20 F 2.75
## 6 Frank 27 M 3.32
## 7 George 34 M 3.68
## 8 Helen 42 F 3.97
Our data frame is called nsc. This is similar to renaming the given dataset as nsc. Notice that each column of our data frame has one mode. For data frames, you can put columns of different modes together, just like in a dataset. Here, the variables, Student_Name and Gender are both categorical. Whereas, the variables Age and GPA are both quantitative.
Take a look at the Environment panel. Under the heading called “Data”, you should see your data frame called nsc. To its right, it should say 8 obs. of 4 variables. That means that there are 4 columns and each column has 8 entries. Scrolling down the Environmental panel, you will see the column names under “Values.” To the right of the column names, you will see the mode of each variable, its length, and its elements.
If you click on the line called nsc, a tab called nsc and a table of your dataset will appear on the Source panel. You get the same result when using the function View(data_frame). In this case, View(nsc) will yield a new tab called nsc showing your dataset. This is a convenient feature of RStudio as you can easily toggle between the script tab and your dataset tab to look at your dataset.
6.2 Intro to R Scripts
Let’s take a look at some R scripts. You will notice that, in some instances, there are different ways of writing R scripts with the same output.
## [1] "Student_Name" "Age" "Gender" "GPA"
## Age
## 1 27
## 2 55
## 3 34
## 4 42
## 5 20
## 6 27
## 7 34
## 8 42
## Age
## 1 27
## 2 55
## 3 34
## 4 42
## 5 20
## 6 27
## 7 34
## 8 42
## Age Gender
## 1 27 F
## 2 55 M
## 3 34 M
## 4 42 F
## 5 20 F
## 6 27 M
## 7 34 M
## 8 42 F
## Age Gender
## 1 27 F
## 2 55 M
## 3 34 M
## 4 42 F
## 5 20 F
## 6 27 M
## 7 34 M
## 8 42 F
## Student_Name Age Gender GPA
## 2 Bob 55 M 3.75
## [1] 55 34
## Age Gender
## 2 55 M
## 3 34 M
6.3 Extracting Entries
In general, the format for accessing entries of a variable in a data frame is by using the dollar, $
, symbol as follows: data_frame$variable.
Let us look at some examples. Notice that the results are shown as a row vector.
## [1] 27 55 34 42 20 27 34 42
## [1] F M M F F M M F
## Levels: F M
6.4 Generating a Count
The function, table( ) generates a count.
##
## 20 27 34 42 55
## 1 2 2 2 1
##
## F M
## 20 1 0
## 27 1 1
## 34 0 2
## 42 2 0
## 55 0 1