# Chapter 6 Working with Data Frames

Most of the datasets we will encounter are objects called data frames in R.

## 6.1 Creating a Data Frame

Suppose we have the following dataset.

Student Name Age Gender GPA
Amy 27 F 3.26
Bob 55 M 3.75
Chuck 34 M 2.98
Daisy 42 F 3.40
Ellie 20 F 2.75
Frank 27 M 3.32
George 34 M 3.68
Helen 42 F 3.97

Let create a data frame using the dataset given.

Student_Name <- c("Amy", "Bob", "Chuck", "Daisy", "Ellie", "Frank",
"George", "Helen")
Age <- c(27, 55, 34, 42, 20, 27, 34, 42)
Gender <- c("F", "M", "M", "F", "F", "M", "M", "F")
GPA <- c(3.26, 3.75, 2.98, 3.40, 2.75, 3.32, 3.68, 3.97)
nsc <- data.frame(Student_Name, Age, Gender, GPA)   # Naming the data frame
nsc # Generates the data frame
##   Student_Name Age Gender  GPA
## 1          Amy  27      F 3.26
## 2          Bob  55      M 3.75
## 3        Chuck  34      M 2.98
## 4        Daisy  42      F 3.40
## 5        Ellie  20      F 2.75
## 6        Frank  27      M 3.32
## 7       George  34      M 3.68
## 8        Helen  42      F 3.97

Our data frame is called nsc. This is similar to renaming the given dataset as nsc. Notice that each column of our data frame has one mode. For data frames, you can put columns of different modes together, just like in a dataset. Here, the variables, Student_Name and Gender are both categorical. Whereas, the variables Age and GPA are both quantitative.

Take a look at the Environment panel. Under the heading called “Data”, you should see your data frame called nsc. To its right, it should say 8 obs. of 4 variables. That means that there are 4 columns and each column has 8 entries. Scrolling down the Environmental panel, you will see the column names under “Values.” To the right of the column names, you will see the mode of each variable, its length, and its elements.

If you click on the line called nsc, a tab called nsc and a table of your dataset will appear on the Source panel. You get the same result when using the function View(data_frame). In this case, View(nsc) will yield a new tab called nsc showing your dataset. This is a convenient feature of RStudio as you can easily toggle between the script tab and your dataset tab to look at your dataset.

## 6.2 Intro to R Scripts

Let’s take a look at some R scripts. You will notice that, in some instances, there are different ways of writing R scripts with the same output.

# Lists variables
names(nsc)    
## [1] "Student_Name" "Age"          "Gender"       "GPA"
# Generates all entries under variable, Age
nsc[c("Age")]    
##   Age
## 1  27
## 2  55
## 3  34
## 4  42
## 5  20
## 6  27
## 7  34
## 8  42
# Generates all entries in the 2nd column, which is Age
nsc[2]
##   Age
## 1  27
## 2  55
## 3  34
## 4  42
## 5  20
## 6  27
## 7  34
## 8  42
# Generates all entries in the 2nd and 3rd column of data frame
nsc[2:3]    
##   Age Gender
## 1  27      F
## 2  55      M
## 3  34      M
## 4  42      F
## 5  20      F
## 6  27      M
## 7  34      M
## 8  42      F
# Generates the same output as nsc[2:3]
nsc[c("Age", "Gender")]     
##   Age Gender
## 1  27      F
## 2  55      M
## 3  34      M
## 4  42      F
## 5  20      F
## 6  27      M
## 7  34      M
## 8  42      F
# Generates all data in 2nd row
nsc[2, ]    
##   Student_Name Age Gender  GPA
## 2          Bob  55      M 3.75
# Generates 2nd and 3rd row data on Age
nsc[2:3, c("Age")]  
## [1] 55 34
# Generates data of 2nd and 3rd row and 2nd and 3rd column
nsc[2:3, 2:3] 
##   Age Gender
## 2  55      M
## 3  34      M

## 6.3 Extracting Entries

In general, the format for accessing entries of a variable in a data frame is by using the dollar, $, symbol as follows: data_frame$variable.

Let us look at some examples. Notice that the results are shown as a row vector.

# Lists all entries under variable, Age
nsc$Age  ## [1] 27 55 34 42 20 27 34 42 # Lists all entries under variable, Gender nsc$Gender    
## [1] F M M F F M M F
## Levels: F M

## 6.4 Generating a Count

The function, table( ) generates a count.

# Generates a count by Age
table(nsc$Age) ## ## 20 27 34 42 55 ## 1 2 2 2 1 # Generates a cross-tab count by Age and Gender table(nsc$Age, nsc\$Gender)  
##
##      F M
##   20 1 0
##   27 1 1
##   34 0 2
##   42 2 0
##   55 0 1