Chapter 3 Play with R
3.1 Read the data from an excel/SPSS file
In R, the fundamental unit of shareable code is the package. A package bundles together code, data, documentation, and tests, and is easy to share with others. Thanks to the different package, R can import Excel or SPSS file into the R console. Let’s install and load the package first. Then we can use certain functions inside this package to read the SPSS/Excel file.
library(readxl) # The R package that you need to import the excel data file
# Using read_excel() function to open excel file
read_excel("asgusam5_excel.xlsx")
asgusam5 <-# Using R-basic function str() to overview the type of all the variables
# Also, we can view the first 6 row of the data by using head() function
str(asgusam5)
## tibble [12,569 × 50] (S3: tbl_df/tbl/data.frame)
## $ id : num [1:12569] 1 2 3 4 5 6 7 8 9 10 ...
## $ gender : num [1:12569] 1 2 2 1 2 2 1 1 2 1 ...
## $ month : num [1:12569] 1 9 10 8 8 11 1 11 8 6 ...
## $ year : num [1:12569] 5 4 4 4 4 4 4 4 4 4 ...
## $ language : num [1:12569] 1 1 1 1 1 1 1 1 1 1 ...
## $ book : num [1:12569] 4 3 4 3 5 3 2 3 4 2 ...
## $ home_computer : num [1:12569] 1 1 1 1 1 1 1 1 1 2 ...
## $ home_desk : num [1:12569] 2 1 1 1 1 1 1 1 1 1 ...
## $ home_book : num [1:12569] 1 1 1 1 1 1 1 1 1 1 ...
## $ home_room : num [1:12569] 1 2 1 1 1 2 2 2 1 2 ...
## $ home_internet : num [1:12569] 1 1 1 1 1 1 1 1 1 2 ...
## $ computer_home : num [1:12569] 1 1 1 2 1 2 1 2 4 4 ...
## $ computer_school: num [1:12569] 4 4 2 2 3 3 3 2 4 4 ...
## $ computer_some : num [1:12569] 4 4 3 1 4 4 1 4 4 2 ...
## $ parentsupport1 : num [1:12569] 1 1 4 1 2 2 1 1 4 4 ...
## $ parentsupport2 : num [1:12569] 2 1 1 1 1 2 4 1 3 4 ...
## $ parentsupport3 : num [1:12569] 1 1 2 1 4 1 1 1 1 3 ...
## $ parentsupport4 : num [1:12569] 1 1 1 1 2 1 2 1 1 2 ...
## $ school1 : num [1:12569] 3 1 2 1 3 2 4 1 3 4 ...
## $ school2 : num [1:12569] 3 4 1 1 2 1 2 2 1 3 ...
## $ school3 : num [1:12569] 2 4 2 1 2 1 2 2 1 2 ...
## $ studentbullied1: num [1:12569] 3 1 2 1 1 2 4 2 2 4 ...
## $ studentbullied2: num [1:12569] 3 1 2 2 1 3 4 3 4 4 ...
## $ studentbullied3: num [1:12569] 3 1 4 3 2 3 4 2 4 4 ...
## $ studentbullied4: num [1:12569] 4 1 4 4 1 2 4 2 4 4 ...
## $ studentbullied5: num [1:12569] 3 1 2 3 1 3 4 1 3 4 ...
## $ studentbullied6: num [1:12569] 4 3 4 4 2 4 4 1 4 4 ...
## $ learning1 : num [1:12569] 3 1 1 1 1 2 4 1 1 4 ...
## $ learning2 : num [1:12569] 4 4 4 4 3 3 1 4 4 1 ...
## $ learning3 : num [1:12569] 3 1 1 2 2 2 4 2 4 4 ...
## $ learning4 : num [1:12569] 4 4 4 4 4 3 2 4 4 1 ...
## $ learning5 : num [1:12569] 1 1 1 1 2 2 2 1 1 3 ...
## $ learning6 : num [1:12569] 1 1 1 1 1 2 4 1 1 4 ...
## $ learning7 : num [1:12569] 1 1 1 1 1 2 1 1 1 3 ...
## $ engagement1 : num [1:12569] 1 1 1 1 2 1 1 1 1 4 ...
## $ engagement2 : num [1:12569] 1 4 2 3 3 2 1 2 1 1 ...
## $ engagement3 : num [1:12569] 3 2 1 1 3 2 1 2 1 4 ...
## $ engagement4 : num [1:12569] 2 1 1 1 2 2 4 1 1 4 ...
## $ engagement5 : num [1:12569] 1 1 1 1 2 1 4 1 1 4 ...
## $ confidence1 : num [1:12569] 3 1 1 1 1 2 2 2 1 4 ...
## $ confidence2 : num [1:12569] 1 4 4 4 3 3 4 4 4 1 ...
## $ confidence3 : num [1:12569] 1 4 4 4 3 4 4 4 4 1 ...
## $ confidence4 : num [1:12569] 4 1 1 1 3 2 2 2 1 3 ...
## $ confidence5 : num [1:12569] 3 1 1 1 4 2 3 1 1 4 ...
## $ confidence6 : num [1:12569] 1 4 4 4 4 4 4 4 4 1 ...
## $ score1 : num [1:12569] 492 517 656 550 642 ...
## $ score2 : num [1:12569] 487 576 603 567 644 ...
## $ score3 : num [1:12569] 463 536 627 575 673 ...
## $ score4 : num [1:12569] 455 537 574 544 645 ...
## $ score5 : num [1:12569] 476 513 633 609 637 ...
Now, let’s use the head function to preview the first 6 rows of data
head(asgusam5)
## # A tibble: 6 x 50
## id gender month year language book home_computer home_desk home_book
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 1 1 5 1 4 1 2 1
## 2 2 2 9 4 1 3 1 1 1
## 3 3 2 10 4 1 4 1 1 1
## 4 4 1 8 4 1 3 1 1 1
## 5 5 2 8 4 1 5 1 1 1
## 6 6 2 11 4 1 3 1 1 1
## # … with 41 more variables: home_room <dbl>, home_internet <dbl>,
## # computer_home <dbl>, computer_school <dbl>, computer_some <dbl>,
## # parentsupport1 <dbl>, parentsupport2 <dbl>, parentsupport3 <dbl>,
## # parentsupport4 <dbl>, school1 <dbl>, school2 <dbl>, school3 <dbl>,
## # studentbullied1 <dbl>, studentbullied2 <dbl>, studentbullied3 <dbl>,
## # studentbullied4 <dbl>, studentbullied5 <dbl>, studentbullied6 <dbl>,
## # learning1 <dbl>, learning2 <dbl>, learning3 <dbl>, learning4 <dbl>,
## # learning5 <dbl>, learning6 <dbl>, learning7 <dbl>, engagement1 <dbl>,
## # engagement2 <dbl>, engagement3 <dbl>, engagement4 <dbl>, engagement5 <dbl>,
## # confidence1 <dbl>, confidence2 <dbl>, confidence3 <dbl>, confidence4 <dbl>,
## # confidence5 <dbl>, confidence6 <dbl>, score1 <dbl>, score2 <dbl>,
## # score3 <dbl>, score4 <dbl>, score5 <dbl>
# for the future operation, we need to build a new copy to the orginal dataset so we can get back to our original data set whenever we want.
asgusam5 test_data <-
3.2 Transform the data
As you can see, the default data type for all the variables is numeric. The numeric data type is for interval & ratio scales. However,some of the data such as gender, month, language, book are supposed to be categorical data. We need to transform them to the categorical data type by using factor() function.
# First, we change the data type on gender variable
$gender <- factor(test_data$gender,levels=c(1,2),labels=c("male","female"))
test_data# check the data by using class function to see the new data type for gender
class(test_data$gender)
## [1] "factor"
# Then, let's change the language variable as well
$language <- factor(test_data$language,levels=c(1,2,3),labels=c("Always Speak","Sometimes Speak","Never Speak"))
test_data# check the data by using class function to see the new data type for language
class(test_data$language)
## [1] "factor"
Can you make a new one for variable:month?
3.3 Compute Variable
3.4 Class Activity 1: Calculate the aggregated data
3.4.1 Create new variable,‘ScienceTotal’,using the average of (score1-score5).
# In this semester, we will use "dplyr" package for most of the data manipulation.
# The detailed function inside this package will be described case by case
library(dplyr)
# Calculating the Science total score by using rowMeans() function
$ScienceTotal <- rowMeans(subset(test_data,select=c(score1,score2,score3,score4,score5)),na.rm=TRUE)
test_data# You can check student's average total score by use mean function
mean(test_data$ScienceTotal) # It's 542.17
## [1] 542.1741
3.4.2 Create new variable,‘ParentSupport’ using the mean of 4 variables (parentsupport1-parentsupport4)
# This uses pretty much the same with the previous code
$ParentSupport <- rowMeans(subset(test_data,select=c(parentsupport1,parentsupport2,parentsupport3,parentsupport4)), na.rm=TRUE) test_data
3.4.3 Create new variable,‘StudentsBullied’ using the sum of 6 variables (studentbullied1-studentbullied6)
# Since we are going to find out the sum instead of the mean, so this time we should use rowSums function.
$StudentsBullied <-rowSums(test_data[,c("studentbullied1","studentbullied2","studentbullied3","studentbullied4","studentbullied5","studentbullied6")], na.rm=TRUE) test_data
3.4.4 Perform descriptive analysis (mean,median,mode,and S.D.) on ScienceScore, ParentSupport,and StudentsBullied.
# Polling out these new variables we just made, and store them to a new data frame
subset(test_data,select=c("ScienceTotal","ParentSupport","StudentsBullied"))
new_data <-# We can get mean and median for these variables by using summary function
summary(new_data)
## ScienceTotal ParentSupport StudentsBullied
## Min. :276.7 Min. :1.000 Min. : 0.0
## 1st Qu.:493.8 1st Qu.:1.000 1st Qu.:17.0
## Median :547.5 Median :1.500 Median :21.0
## Mean :542.2 Mean :1.808 Mean :20.2
## 3rd Qu.:594.5 3rd Qu.:2.000 3rd Qu.:23.0
## Max. :774.0 Max. :9.000 Max. :54.0
## NA's :23
# Then,we can use SD() function in "psych" package to obtain the SD.
library(psych)
SD(new_data,na.rm=TRUE)
## ScienceTotal ParentSupport StudentsBullied
## 74.435145 1.230632 6.208656
# For calculate the mode, we need build a own funtion named "Mode"
function(x) {
Mode <- unique(x)
uni <-which.max(tabulate(match(x, uni)))]
uni[
}# Calculate the required mode for specific data
Mode(new_data$ScienceTotal) # Mode is 474.58 for ScienceTotal
## [1] 474.5791
Mode(new_data$ParentSupport) # Mode is 1 for ParentSupport
## [1] 1
Mode(new_data$StudentsBullied) # Mode is 24 for StudentsBullied
## [1] 24
3.5 Recoding Variables
3.5.1 Load the car package for reverse coding
# for reverse coding the variables in R, we need the recode() function in car package
test_data$engagement2
item_need_reverse <-library(car)
3.5.2 Reverse code the test item
recode(item_need_reverse, "1=4; 2=3; 3=2; 4=1") ## Reversed Code the engagement 2 item
item_reversed <-# Use the table() function to double check, and also review the frequencies for each categories
table(item_need_reverse, useNA = "ifany")
## item_need_reverse
## 1 2 3 4 9 <NA>
## 1798 2765 2105 5561 317 23
3.6 Class Activity 2: Recode Variables
3.6.1 Recode into same variables for ‘learning2, learning4’
$learning2 <- recode(test_data$learning2, "1=4; 2=3; 3=2; 4=1") ## Reversed Code the item learning 2
test_data$learning4 <- recode(test_data$learning4, "1=4; 2=3; 3=2; 4=1") ## Reversed Code the item learning 4 test_data
3.6.2 Recode ‘confidence2, confidence3, confidence6’ variables only for students who are born in 1999 (‘year’ variable) and save the recoded variables into ‘confidence2_re, confidence3_re, confidence6_re’ variables.
## Subset the data
test_data %>%
new_test_data <- filter(year==2)
## Recode ‘confidence2, confidence3, confidence6’
$confidence2_re <- recode(new_test_data$confidence2, "1=4; 2=3; 3=2; 4=1")
new_test_data$confidence3_re <- recode(new_test_data$confidence3, "1=4; 2=3; 3=2; 4=1")
new_test_data$confidence6_re <- recode(new_test_data$confidence6, "1=4; 2=3; 3=2; 4=1") new_test_data
3.6.3 Perform frequency analysis for ‘learning2, learning4, confidence2_re, confidence3_re, confidence6_re’.
table(test_data$learning2,useNA = "ifany")
##
## 1 2 3 4 9 <NA>
## 6765 2157 1804 1528 292 23
table(test_data$learning4,useNA = "ifany")
##
## 1 2 3 4 9 <NA>
## 7304 1960 1684 1183 415 23
table(new_test_data$confidence2_re,useNA = "ifany")
##
## 1 2 3 4 9 <NA>
## 101 39 31 31 7 1
table(new_test_data$confidence3_re,useNA = "ifany")
##
## 1 2 3 4 9 <NA>
## 94 32 45 27 11 1
table(new_test_data$confidence6_re,useNA = "ifany")
##
## 1 2 3 4 9 <NA>
## 109 36 26 29 9 1
3.7 Class Activity 3: Select Cases
3.7.1 Select ‘gender = girl’ and ‘year = 2000’ and create a new dataset named by GIRL_2000. What is the mean of StudentBullied1?
# Using function filter() in dplyr package to filter the data
2000 <- test_data %>%
girl_ filter(year==3,gender=="female")
# looking for mean of the studentbullied1 in girl_2000 dataset
describe(girl_2000$studentbullied1)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 2284 3.08 1.48 3 3.07 1.48 1 9 8 1 3.85 0.03
3.7.2 Select students who have id=3001 to id=4000 and filter out unselected cases. What is the variance of parentsupport1?
# Using function filter() in dplyr package to filter the data
test_data %>%
id3000_4000student <- filter(id>3000,id<=4000)
# looking for variance of parentsupport1 in id3000_4000student dataset
var(id3000_4000student$parentsupport1,na.rm=TRUE)
## [1] 2.530103
3.7.3 Select 10% of total student at random and delete unselected cases. What is the frequency of learning1?
# We need sample_n() function in dplyr package
sample_n(test_data,1257)
sample_data <-# Check the frequency of learning1
table(sample_data$learning1,useNA = "ifany")
##
## 1 2 3 4 9 <NA>
## 786 295 80 80 15 1
3.8 Sorting Cases & Merging files
3.8.1 Sorting Cases
# Example. sort the cases by year and month
test_data[order(test_data$year,test_data$month),]
new_ordered_data <-head(new_ordered_data,50)
## # A tibble: 50 x 53
## id gender month year language book home_computer home_desk home_book
## <dbl> <fct> <dbl> <dbl> <fct> <dbl> <dbl> <dbl> <dbl>
## 1 6529 female 4 1 Sometim… 2 1 1 1
## 2 2397 female 10 1 Always … 2 1 1 1
## 3 3116 male 10 1 Sometim… 3 1 1 1
## 4 867 female 11 1 Never S… 2 1 1 1
## 5 910 female 11 1 Always … 2 1 1 1
## 6 2071 female 11 1 Always … 3 1 1 2
## 7 9152 male 11 1 Sometim… 1 2 2 2
## 8 907 male 12 1 Sometim… 2 1 1 1
## 9 1345 female 12 1 Always … 2 2 2 2
## 10 2222 female 12 1 Always … 1 1 1 1
## # … with 40 more rows, and 44 more variables: home_room <dbl>,
## # home_internet <dbl>, computer_home <dbl>, computer_school <dbl>,
## # computer_some <dbl>, parentsupport1 <dbl>, parentsupport2 <dbl>,
## # parentsupport3 <dbl>, parentsupport4 <dbl>, school1 <dbl>, school2 <dbl>,
## # school3 <dbl>, studentbullied1 <dbl>, studentbullied2 <dbl>,
## # studentbullied3 <dbl>, studentbullied4 <dbl>, studentbullied5 <dbl>,
## # studentbullied6 <dbl>, learning1 <dbl>, learning2 <dbl>, learning3 <dbl>,
## # learning4 <dbl>, learning5 <dbl>, learning6 <dbl>, learning7 <dbl>,
## # engagement1 <dbl>, engagement2 <dbl>, engagement3 <dbl>, engagement4 <dbl>,
## # engagement5 <dbl>, confidence1 <dbl>, confidence2 <dbl>, confidence3 <dbl>,
## # confidence4 <dbl>, confidence5 <dbl>, confidence6 <dbl>, score1 <dbl>,
## # score2 <dbl>, score3 <dbl>, score4 <dbl>, score5 <dbl>, ScienceTotal <dbl>,
## # ParentSupport <dbl>, StudentsBullied <dbl>
3.8.2 Merging file: Add Cases
# Import the cases first
library(haven) # For import the SPSS files
read_sav("Data_add cases.sav")
Data_add_cases <-# In R, each observation is a row, and each variable is a colcumn.
# The previous operation has already changed the variables number of our test data. And that's why we need made a copy of the original data. Now we need copy a new test data for this operation.
asgusam5
test_data2 <-# We can use the rbind() function to add the cases
rbind(test_data2,Data_add_cases)
added_test_data <-describe(added_test_data) # Now we have 13069 observations instead of 12569 observations,right?
## vars n mean sd median trimmed mad min
## id 1 13069 6628.01 3952.32 6535.00 6535.00 4843.65 1.00
## gender 2 13052 1.50 0.50 1.00 1.50 0.00 1.00
## month 3 13052 6.57 3.42 7.00 6.58 4.45 1.00
## year 4 13052 3.62 0.53 4.00 3.67 0.00 1.00
## language 5 13033 1.36 1.26 1.00 1.09 0.00 1.00
## book 6 13033 2.93 1.33 3.00 2.85 1.48 1.00
## home_computer 7 13040 1.13 0.75 1.00 1.00 0.00 1.00
## home_desk 8 13038 1.32 0.89 1.00 1.18 0.00 1.00
## home_book 9 13035 1.17 0.85 1.00 1.00 0.00 1.00
## home_room 10 13037 1.35 0.87 1.00 1.24 0.00 1.00
## home_internet 11 13036 1.25 0.97 1.00 1.08 0.00 1.00
## computer_home 12 13031 1.99 1.46 2.00 1.72 1.48 1.00
## computer_school 13 13008 2.58 1.83 2.00 2.22 1.48 1.00
## computer_some 14 13007 3.25 1.82 3.00 3.05 1.48 1.00
## parentsupport1 15 13027 1.79 1.38 1.00 1.50 0.00 1.00
## parentsupport2 16 13024 2.12 1.57 2.00 1.86 1.48 1.00
## parentsupport3 17 13022 1.65 1.55 1.00 1.27 0.00 1.00
## parentsupport4 18 13026 1.64 1.47 1.00 1.28 0.00 1.00
## school1 19 13034 2.00 1.25 2.00 1.80 1.48 1.00
## school2 20 13032 1.74 1.34 1.00 1.45 0.00 1.00
## school3 21 13027 1.85 1.45 1.00 1.56 0.00 1.00
## studentbullied1 22 13034 3.08 1.38 3.00 3.13 1.48 1.00
## studentbullied2 23 13032 3.28 1.37 4.00 3.35 0.00 1.00
## studentbullied3 24 13023 3.32 1.42 4.00 3.36 0.00 1.00
## studentbullied4 25 13027 3.47 1.40 4.00 3.52 0.00 1.00
## studentbullied5 26 13022 3.41 1.36 4.00 3.48 0.00 1.00
## studentbullied6 27 13029 3.66 1.17 4.00 3.80 0.00 1.00
## learning1 28 13028 1.72 1.28 1.00 1.45 0.00 1.00
## learning2 29 13022 3.29 1.38 4.00 3.34 0.00 1.00
## learning3 30 13017 2.99 1.53 3.00 2.93 1.48 1.00
## learning4 31 13016 3.45 1.43 4.00 3.49 0.00 1.00
## learning5 32 13019 1.61 1.46 1.00 1.25 0.00 1.00
## learning6 33 13019 1.85 1.54 1.00 1.51 0.00 1.00
## learning7 34 13022 1.47 1.31 1.00 1.15 0.00 1.00
## engagement1 35 13026 1.45 1.21 1.00 1.18 0.00 1.00
## engagement2 36 13019 3.08 1.45 3.00 3.08 1.48 1.00
## engagement3 37 13018 1.69 1.48 1.00 1.34 0.00 1.00
## engagement4 38 13014 1.72 1.46 1.00 1.40 0.00 1.00
## engagement5 39 13021 1.66 1.39 1.00 1.34 0.00 1.00
## confidence1 40 13025 1.73 1.28 1.00 1.47 0.00 1.00
## confidence2 41 13020 3.30 1.38 4.00 3.34 0.00 1.00
## confidence3 42 13015 3.49 1.45 4.00 3.50 0.00 1.00
## confidence4 43 13011 2.08 1.63 2.00 1.74 1.48 1.00
## confidence5 44 13015 2.31 1.59 2.00 2.06 1.48 1.00
## confidence6 45 13019 3.39 1.38 4.00 3.45 0.00 1.00
## score1 46 13069 542.79 78.84 547.35 545.10 77.96 241.63
## score2 47 13069 540.85 79.09 546.05 543.28 78.23 210.85
## score3 48 13069 540.86 79.51 545.86 543.39 78.17 184.31
## score4 49 13069 540.30 79.60 545.11 542.43 78.76 214.59
## score5 50 13069 542.05 78.61 546.65 544.32 78.48 228.65
## max range skew kurtosis se
## id 15500.00 15499.00 0.19 -0.87 34.57
## gender 2.00 1.00 0.01 -2.00 0.00
## month 12.00 11.00 -0.04 -1.20 0.03
## year 5.00 4.00 -0.88 -0.07 0.00
## language 9.00 8.00 5.30 28.93 0.01
## book 9.00 8.00 1.02 3.00 0.01
## home_computer 9.00 8.00 9.36 94.44 0.01
## home_desk 9.00 8.00 6.68 54.03 0.01
## home_book 9.00 8.00 8.18 71.79 0.01
## home_room 9.00 8.00 6.48 53.74 0.01
## home_internet 9.00 8.00 6.88 51.57 0.01
## computer_home 9.00 8.00 2.79 10.19 0.01
## computer_school 9.00 8.00 2.49 6.21 0.02
## computer_some 9.00 8.00 1.63 3.54 0.02
## parentsupport1 9.00 8.00 2.90 11.09 0.01
## parentsupport2 9.00 8.00 2.36 7.28 0.01
## parentsupport3 9.00 8.00 3.23 11.47 0.01
## parentsupport4 9.00 8.00 3.22 11.97 0.01
## school1 9.00 8.00 2.60 10.98 0.01
## school2 9.00 8.00 3.38 14.57 0.01
## school3 9.00 8.00 2.85 10.47 0.01
## studentbullied1 9.00 8.00 0.67 3.52 0.01
## studentbullied2 9.00 8.00 0.84 4.47 0.01
## studentbullied3 9.00 8.00 1.02 4.70 0.01
## studentbullied4 9.00 8.00 1.22 5.54 0.01
## studentbullied5 9.00 8.00 0.97 5.20 0.01
## studentbullied6 9.00 8.00 0.87 7.81 0.01
## learning1 9.00 8.00 3.15 13.61 0.01
## learning2 9.00 8.00 1.00 4.71 0.01
## learning3 9.00 8.00 1.42 4.50 0.01
## learning4 9.00 8.00 1.28 5.24 0.01
## learning5 9.00 8.00 3.71 15.19 0.01
## learning6 9.00 8.00 2.97 10.42 0.01
## learning7 9.00 8.00 4.43 21.62 0.01
## engagement1 9.00 8.00 4.70 25.28 0.01
## engagement2 9.00 8.00 1.24 4.46 0.01
## engagement3 9.00 8.00 3.61 14.47 0.01
## engagement4 9.00 8.00 3.50 14.06 0.01
## engagement5 9.00 8.00 3.60 15.29 0.01
## confidence1 9.00 8.00 3.64 17.03 0.01
## confidence2 9.00 8.00 1.27 5.40 0.01
## confidence3 9.00 8.00 1.40 5.35 0.01
## confidence4 9.00 8.00 2.79 9.04 0.01
## confidence5 9.00 8.00 2.37 7.42 0.01
## confidence6 9.00 8.00 1.05 5.06 0.01
## score1 806.35 564.72 -0.29 0.02 0.69
## score2 780.93 570.08 -0.31 0.05 0.69
## score3 779.10 594.79 -0.31 0.03 0.70
## score4 796.09 581.50 -0.26 0.04 0.70
## score5 795.12 566.47 -0.28 0.01 0.69
3.8.3 Merging file: Add variable
# We can use the cbind() function to add the variable
# Import the variables first
read_sav("Data_add_variables.sav")
Data_add_variables <- cbind(test_data,Data_add_variables)
variable_added_data <-# Check the new dataset
str(variable_added_data) # Now we have 60 variables instead of 53, right? Good job!
## 'data.frame': 12569 obs. of 60 variables:
## $ id : num 1 2 3 4 5 6 7 8 9 10 ...
## $ gender : Factor w/ 2 levels "male","female": 1 2 2 1 2 2 1 1 2 1 ...
## $ month : num 1 9 10 8 8 11 1 11 8 6 ...
## $ year : num 5 4 4 4 4 4 4 4 4 4 ...
## $ language : Factor w/ 3 levels "Always Speak",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ book : num 4 3 4 3 5 3 2 3 4 2 ...
## $ home_computer : num 1 1 1 1 1 1 1 1 1 2 ...
## $ home_desk : num 2 1 1 1 1 1 1 1 1 1 ...
## $ home_book : num 1 1 1 1 1 1 1 1 1 1 ...
## $ home_room : num 1 2 1 1 1 2 2 2 1 2 ...
## $ home_internet : num 1 1 1 1 1 1 1 1 1 2 ...
## $ computer_home : num 1 1 1 2 1 2 1 2 4 4 ...
## $ computer_school: num 4 4 2 2 3 3 3 2 4 4 ...
## $ computer_some : num 4 4 3 1 4 4 1 4 4 2 ...
## $ parentsupport1 : num 1 1 4 1 2 2 1 1 4 4 ...
## $ parentsupport2 : num 2 1 1 1 1 2 4 1 3 4 ...
## $ parentsupport3 : num 1 1 2 1 4 1 1 1 1 3 ...
## $ parentsupport4 : num 1 1 1 1 2 1 2 1 1 2 ...
## $ school1 : num 3 1 2 1 3 2 4 1 3 4 ...
## $ school2 : num 3 4 1 1 2 1 2 2 1 3 ...
## $ school3 : num 2 4 2 1 2 1 2 2 1 2 ...
## $ studentbullied1: num 3 1 2 1 1 2 4 2 2 4 ...
## $ studentbullied2: num 3 1 2 2 1 3 4 3 4 4 ...
## $ studentbullied3: num 3 1 4 3 2 3 4 2 4 4 ...
## $ studentbullied4: num 4 1 4 4 1 2 4 2 4 4 ...
## $ studentbullied5: num 3 1 2 3 1 3 4 1 3 4 ...
## $ studentbullied6: num 4 3 4 4 2 4 4 1 4 4 ...
## $ learning1 : num 3 1 1 1 1 2 4 1 1 4 ...
## $ learning2 : num 1 1 1 1 2 2 4 1 1 4 ...
## $ learning3 : num 3 1 1 2 2 2 4 2 4 4 ...
## $ learning4 : num 1 1 1 1 1 2 3 1 1 4 ...
## $ learning5 : num 1 1 1 1 2 2 2 1 1 3 ...
## $ learning6 : num 1 1 1 1 1 2 4 1 1 4 ...
## $ learning7 : num 1 1 1 1 1 2 1 1 1 3 ...
## $ engagement1 : num 1 1 1 1 2 1 1 1 1 4 ...
## $ engagement2 : num 1 4 2 3 3 2 1 2 1 1 ...
## $ engagement3 : num 3 2 1 1 3 2 1 2 1 4 ...
## $ engagement4 : num 2 1 1 1 2 2 4 1 1 4 ...
## $ engagement5 : num 1 1 1 1 2 1 4 1 1 4 ...
## $ confidence1 : num 3 1 1 1 1 2 2 2 1 4 ...
## $ confidence2 : num 1 4 4 4 3 3 4 4 4 1 ...
## $ confidence3 : num 1 4 4 4 3 4 4 4 4 1 ...
## $ confidence4 : num 4 1 1 1 3 2 2 2 1 3 ...
## $ confidence5 : num 3 1 1 1 4 2 3 1 1 4 ...
## $ confidence6 : num 1 4 4 4 4 4 4 4 4 1 ...
## $ score1 : num 492 517 656 550 642 ...
## $ score2 : num 487 576 603 567 644 ...
## $ score3 : num 463 536 627 575 673 ...
## $ score4 : num 455 537 574 544 645 ...
## $ score5 : num 476 513 633 609 637 ...
## $ ScienceTotal : num 475 536 619 569 648 ...
## $ ParentSupport : num 1.25 1 2 1 2.25 1.5 2 1 2.25 3.25 ...
## $ StudentsBullied: num 20 8 18 17 8 17 24 11 21 24 ...
## $ id : num 1 2 3 4 5 6 7 8 9 10 ...
## ..- attr(*, "format.spss")= chr "F12.0"
## ..- attr(*, "display_width")= int 12
## $ IDCNTRY : num 840 840 840 840 840 840 840 840 840 840 ...
## ..- attr(*, "label")= chr "*COUNTRY ID*"
## ..- attr(*, "format.spss")= chr "F5.0"
## $ IDBOOK : num 2 3 4 5 6 7 9 10 11 12 ...
## ..- attr(*, "label")= chr "*ACHIEVEMENT TEST BOOKLET*"
## ..- attr(*, "format.spss")= chr "F2.0"
## $ IDSCHOOL : num 1 1 1 1 1 1 1 1 1 1 ...
## ..- attr(*, "label")= chr "*SCHOOL ID*"
## ..- attr(*, "format.spss")= chr "F4.0"
## $ IDCLASS : num 102 102 102 102 102 102 102 102 102 102 ...
## ..- attr(*, "label")= chr "*CLASS ID*"
## ..- attr(*, "format.spss")= chr "F6.0"
## $ IDSTUD : num 10201 10202 10203 10204 10205 ...
## ..- attr(*, "label")= chr "*STUDENT ID*"
## ..- attr(*, "format.spss")= chr "F8.0"
## $ IDGRADE : dbl+lbl [1:12569] 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4...
## ..@ label : chr "*GRADE ID*"
## ..@ format.spss: chr "F2.0"
## ..@ labels : Named num 3 4 5 6 99
## .. ..- attr(*, "names")= chr "GRADE 3" "GRADE 4" "GRADE 5" "GRADE 6" ...
3.9 Class Exercise
1.Add all cases into the ‘asgusam5.sav’ file from ‘Data_add cases(exercise).sav’ file
2.Add all variables from ‘Data_add variables(exercise).sav’ file into the ‘asgusam5.sav’ file.
Note* Sort the Key variable (e.g.‘id’ variable from both datasets) first before merging the datasets.