Chapter 3 Play with R

3.1 Read the data from an excel/SPSS file

In R, the fundamental unit of shareable code is the package. A package bundles together code, data, documentation, and tests, and is easy to share with others. Thanks to the different package, R can import Excel or SPSS file into the R console. Let’s install and load the package first. Then we can use certain functions inside this package to read the SPSS/Excel file.

library(readxl) # The R package that you need to import the excel data file
# Using read_excel() function to open excel file
asgusam5 <- read_excel("asgusam5_excel.xlsx")
# Using R-basic function str() to overview the type of all the variables
# Also, we can view the first 6 row of the data by using head() function
str(asgusam5)
## tibble [12,569 × 50] (S3: tbl_df/tbl/data.frame)
##  $ id             : num [1:12569] 1 2 3 4 5 6 7 8 9 10 ...
##  $ gender         : num [1:12569] 1 2 2 1 2 2 1 1 2 1 ...
##  $ month          : num [1:12569] 1 9 10 8 8 11 1 11 8 6 ...
##  $ year           : num [1:12569] 5 4 4 4 4 4 4 4 4 4 ...
##  $ language       : num [1:12569] 1 1 1 1 1 1 1 1 1 1 ...
##  $ book           : num [1:12569] 4 3 4 3 5 3 2 3 4 2 ...
##  $ home_computer  : num [1:12569] 1 1 1 1 1 1 1 1 1 2 ...
##  $ home_desk      : num [1:12569] 2 1 1 1 1 1 1 1 1 1 ...
##  $ home_book      : num [1:12569] 1 1 1 1 1 1 1 1 1 1 ...
##  $ home_room      : num [1:12569] 1 2 1 1 1 2 2 2 1 2 ...
##  $ home_internet  : num [1:12569] 1 1 1 1 1 1 1 1 1 2 ...
##  $ computer_home  : num [1:12569] 1 1 1 2 1 2 1 2 4 4 ...
##  $ computer_school: num [1:12569] 4 4 2 2 3 3 3 2 4 4 ...
##  $ computer_some  : num [1:12569] 4 4 3 1 4 4 1 4 4 2 ...
##  $ parentsupport1 : num [1:12569] 1 1 4 1 2 2 1 1 4 4 ...
##  $ parentsupport2 : num [1:12569] 2 1 1 1 1 2 4 1 3 4 ...
##  $ parentsupport3 : num [1:12569] 1 1 2 1 4 1 1 1 1 3 ...
##  $ parentsupport4 : num [1:12569] 1 1 1 1 2 1 2 1 1 2 ...
##  $ school1        : num [1:12569] 3 1 2 1 3 2 4 1 3 4 ...
##  $ school2        : num [1:12569] 3 4 1 1 2 1 2 2 1 3 ...
##  $ school3        : num [1:12569] 2 4 2 1 2 1 2 2 1 2 ...
##  $ studentbullied1: num [1:12569] 3 1 2 1 1 2 4 2 2 4 ...
##  $ studentbullied2: num [1:12569] 3 1 2 2 1 3 4 3 4 4 ...
##  $ studentbullied3: num [1:12569] 3 1 4 3 2 3 4 2 4 4 ...
##  $ studentbullied4: num [1:12569] 4 1 4 4 1 2 4 2 4 4 ...
##  $ studentbullied5: num [1:12569] 3 1 2 3 1 3 4 1 3 4 ...
##  $ studentbullied6: num [1:12569] 4 3 4 4 2 4 4 1 4 4 ...
##  $ learning1      : num [1:12569] 3 1 1 1 1 2 4 1 1 4 ...
##  $ learning2      : num [1:12569] 4 4 4 4 3 3 1 4 4 1 ...
##  $ learning3      : num [1:12569] 3 1 1 2 2 2 4 2 4 4 ...
##  $ learning4      : num [1:12569] 4 4 4 4 4 3 2 4 4 1 ...
##  $ learning5      : num [1:12569] 1 1 1 1 2 2 2 1 1 3 ...
##  $ learning6      : num [1:12569] 1 1 1 1 1 2 4 1 1 4 ...
##  $ learning7      : num [1:12569] 1 1 1 1 1 2 1 1 1 3 ...
##  $ engagement1    : num [1:12569] 1 1 1 1 2 1 1 1 1 4 ...
##  $ engagement2    : num [1:12569] 1 4 2 3 3 2 1 2 1 1 ...
##  $ engagement3    : num [1:12569] 3 2 1 1 3 2 1 2 1 4 ...
##  $ engagement4    : num [1:12569] 2 1 1 1 2 2 4 1 1 4 ...
##  $ engagement5    : num [1:12569] 1 1 1 1 2 1 4 1 1 4 ...
##  $ confidence1    : num [1:12569] 3 1 1 1 1 2 2 2 1 4 ...
##  $ confidence2    : num [1:12569] 1 4 4 4 3 3 4 4 4 1 ...
##  $ confidence3    : num [1:12569] 1 4 4 4 3 4 4 4 4 1 ...
##  $ confidence4    : num [1:12569] 4 1 1 1 3 2 2 2 1 3 ...
##  $ confidence5    : num [1:12569] 3 1 1 1 4 2 3 1 1 4 ...
##  $ confidence6    : num [1:12569] 1 4 4 4 4 4 4 4 4 1 ...
##  $ score1         : num [1:12569] 492 517 656 550 642 ...
##  $ score2         : num [1:12569] 487 576 603 567 644 ...
##  $ score3         : num [1:12569] 463 536 627 575 673 ...
##  $ score4         : num [1:12569] 455 537 574 544 645 ...
##  $ score5         : num [1:12569] 476 513 633 609 637 ...

Now, let’s use the head function to preview the first 6 rows of data

head(asgusam5)
## # A tibble: 6 x 50
##      id gender month  year language  book home_computer home_desk home_book
##   <dbl>  <dbl> <dbl> <dbl>    <dbl> <dbl>         <dbl>     <dbl>     <dbl>
## 1     1      1     1     5        1     4             1         2         1
## 2     2      2     9     4        1     3             1         1         1
## 3     3      2    10     4        1     4             1         1         1
## 4     4      1     8     4        1     3             1         1         1
## 5     5      2     8     4        1     5             1         1         1
## 6     6      2    11     4        1     3             1         1         1
## # … with 41 more variables: home_room <dbl>, home_internet <dbl>,
## #   computer_home <dbl>, computer_school <dbl>, computer_some <dbl>,
## #   parentsupport1 <dbl>, parentsupport2 <dbl>, parentsupport3 <dbl>,
## #   parentsupport4 <dbl>, school1 <dbl>, school2 <dbl>, school3 <dbl>,
## #   studentbullied1 <dbl>, studentbullied2 <dbl>, studentbullied3 <dbl>,
## #   studentbullied4 <dbl>, studentbullied5 <dbl>, studentbullied6 <dbl>,
## #   learning1 <dbl>, learning2 <dbl>, learning3 <dbl>, learning4 <dbl>,
## #   learning5 <dbl>, learning6 <dbl>, learning7 <dbl>, engagement1 <dbl>,
## #   engagement2 <dbl>, engagement3 <dbl>, engagement4 <dbl>, engagement5 <dbl>,
## #   confidence1 <dbl>, confidence2 <dbl>, confidence3 <dbl>, confidence4 <dbl>,
## #   confidence5 <dbl>, confidence6 <dbl>, score1 <dbl>, score2 <dbl>,
## #   score3 <dbl>, score4 <dbl>, score5 <dbl>
# for the future operation, we need to build a new copy to the orginal dataset so we can get back to our original data set whenever we want.
test_data <- asgusam5

3.2 Transform the data

As you can see, the default data type for all the variables is numeric. The numeric data type is for interval & ratio scales. However,some of the data such as gender, month, language, book are supposed to be categorical data. We need to transform them to the categorical data type by using factor() function.

# First, we change the data type on gender variable
test_data$gender <- factor(test_data$gender,levels=c(1,2),labels=c("male","female"))
# check the data by using class function to see the new data type for gender
class(test_data$gender)
## [1] "factor"
# Then, let's change the language variable as well
test_data$language <- factor(test_data$language,levels=c(1,2,3),labels=c("Always Speak","Sometimes Speak","Never Speak"))
# check the data by using class function to see the new data type for language
class(test_data$language)
## [1] "factor"

Can you make a new one for variable:month?

3.3 Compute Variable

3.4 Class Activity 1: Calculate the aggregated data

3.4.1 Create new variable,‘ScienceTotal’,using the average of (score1-score5).

# In this semester, we will use "dplyr" package for most of the data manipulation.
# The detailed function inside this package will be described case by case
library(dplyr)
# Calculating the Science total score by using rowMeans() function
test_data$ScienceTotal <- rowMeans(subset(test_data,select=c(score1,score2,score3,score4,score5)),na.rm=TRUE)
# You can check student's average total score by use mean function
mean(test_data$ScienceTotal) # It's 542.17
## [1] 542.1741

3.4.2 Create new variable,‘ParentSupport’ using the mean of 4 variables (parentsupport1-parentsupport4)

# This uses pretty much the same with the previous code
test_data$ParentSupport <- rowMeans(subset(test_data,select=c(parentsupport1,parentsupport2,parentsupport3,parentsupport4)), na.rm=TRUE)

3.4.3 Create new variable,‘StudentsBullied’ using the sum of 6 variables (studentbullied1-studentbullied6)

# Since we are going to find out the sum instead of the mean, so this time we should use rowSums function.
test_data$StudentsBullied <-rowSums(test_data[,c("studentbullied1","studentbullied2","studentbullied3","studentbullied4","studentbullied5","studentbullied6")], na.rm=TRUE)

3.4.4 Perform descriptive analysis (mean,median,mode,and S.D.) on ScienceScore, ParentSupport,and StudentsBullied.

# Polling out these new variables we just made, and store them to a new data frame 
new_data <- subset(test_data,select=c("ScienceTotal","ParentSupport","StudentsBullied"))
# We can get mean and median for these variables by using summary function
summary(new_data)
##   ScienceTotal   ParentSupport   StudentsBullied
##  Min.   :276.7   Min.   :1.000   Min.   : 0.0   
##  1st Qu.:493.8   1st Qu.:1.000   1st Qu.:17.0   
##  Median :547.5   Median :1.500   Median :21.0   
##  Mean   :542.2   Mean   :1.808   Mean   :20.2   
##  3rd Qu.:594.5   3rd Qu.:2.000   3rd Qu.:23.0   
##  Max.   :774.0   Max.   :9.000   Max.   :54.0   
##                  NA's   :23
# Then,we can use SD() function in "psych" package to obtain the SD.
library(psych)
SD(new_data,na.rm=TRUE)
##    ScienceTotal   ParentSupport StudentsBullied 
##       74.435145        1.230632        6.208656
# For calculate the mode, we need build a own funtion named "Mode"
Mode <- function(x) {
  uni <- unique(x)
  uni[which.max(tabulate(match(x, uni)))]
}
# Calculate the required mode for specific data
Mode(new_data$ScienceTotal) # Mode is 474.58 for ScienceTotal
## [1] 474.5791
Mode(new_data$ParentSupport) # Mode is 1 for ParentSupport
## [1] 1
Mode(new_data$StudentsBullied) # Mode is 24 for StudentsBullied
## [1] 24

3.5 Recoding Variables

3.5.1 Load the car package for reverse coding

# for reverse coding the variables in R, we need the recode() function in car package
item_need_reverse <- test_data$engagement2
library(car)

3.5.2 Reverse code the test item

item_reversed <- recode(item_need_reverse, "1=4; 2=3; 3=2; 4=1") ## Reversed Code the engagement 2 item
# Use the table() function to double check, and also review the frequencies for each categories
table(item_need_reverse, useNA = "ifany")
## item_need_reverse
##    1    2    3    4    9 <NA> 
## 1798 2765 2105 5561  317   23

3.6 Class Activity 2: Recode Variables

3.6.1 Recode into same variables for ‘learning2, learning4’

test_data$learning2 <- recode(test_data$learning2, "1=4; 2=3; 3=2; 4=1") ## Reversed Code the item learning 2
test_data$learning4 <- recode(test_data$learning4, "1=4; 2=3; 3=2; 4=1") ## Reversed Code the item learning 4

3.6.2 Recode ‘confidence2, confidence3, confidence6’ variables only for students who are born in 1999 (‘year’ variable) and save the recoded variables into ‘confidence2_re, confidence3_re, confidence6_re’ variables.

## Subset the data
new_test_data <- test_data %>%
  filter(year==2)
## Recode ‘confidence2, confidence3, confidence6’
new_test_data$confidence2_re <- recode(new_test_data$confidence2, "1=4; 2=3; 3=2; 4=1")
new_test_data$confidence3_re <- recode(new_test_data$confidence3, "1=4; 2=3; 3=2; 4=1")
new_test_data$confidence6_re <- recode(new_test_data$confidence6, "1=4; 2=3; 3=2; 4=1")

3.6.3 Perform frequency analysis for ‘learning2, learning4, confidence2_re, confidence3_re, confidence6_re’.

table(test_data$learning2,useNA = "ifany")
## 
##    1    2    3    4    9 <NA> 
## 6765 2157 1804 1528  292   23
table(test_data$learning4,useNA = "ifany")
## 
##    1    2    3    4    9 <NA> 
## 7304 1960 1684 1183  415   23
table(new_test_data$confidence2_re,useNA = "ifany")
## 
##    1    2    3    4    9 <NA> 
##  101   39   31   31    7    1
table(new_test_data$confidence3_re,useNA = "ifany")
## 
##    1    2    3    4    9 <NA> 
##   94   32   45   27   11    1
table(new_test_data$confidence6_re,useNA = "ifany")
## 
##    1    2    3    4    9 <NA> 
##  109   36   26   29    9    1

3.7 Class Activity 3: Select Cases

3.7.1 Select ‘gender = girl’ and ‘year = 2000’ and create a new dataset named by GIRL_2000. What is the mean of StudentBullied1?

# Using function filter() in dplyr package to filter the data
girl_2000 <- test_data %>%
  filter(year==3,gender=="female")
# looking for mean of the studentbullied1 in girl_2000 dataset
describe(girl_2000$studentbullied1)
##    vars    n mean   sd median trimmed  mad min max range skew kurtosis   se
## X1    1 2284 3.08 1.48      3    3.07 1.48   1   9     8    1     3.85 0.03

3.7.2 Select students who have id=3001 to id=4000 and filter out unselected cases. What is the variance of parentsupport1?

# Using function filter() in dplyr package to filter the data
id3000_4000student <- test_data %>%
  filter(id>3000,id<=4000)
# looking for variance of parentsupport1 in id3000_4000student dataset
var(id3000_4000student$parentsupport1,na.rm=TRUE)
## [1] 2.530103

3.7.3 Select 10% of total student at random and delete unselected cases. What is the frequency of learning1?

# We need sample_n() function in dplyr package
sample_data <- sample_n(test_data,1257)
# Check the frequency of learning1
table(sample_data$learning1,useNA = "ifany")
## 
##    1    2    3    4    9 <NA> 
##  786  295   80   80   15    1

3.8 Sorting Cases & Merging files

3.8.1 Sorting Cases

# Example. sort the cases by year and month
new_ordered_data <- test_data[order(test_data$year,test_data$month),]
head(new_ordered_data,50)
## # A tibble: 50 x 53
##       id gender month  year language  book home_computer home_desk home_book
##    <dbl> <fct>  <dbl> <dbl> <fct>    <dbl>         <dbl>     <dbl>     <dbl>
##  1  6529 female     4     1 Sometim…     2             1         1         1
##  2  2397 female    10     1 Always …     2             1         1         1
##  3  3116 male      10     1 Sometim…     3             1         1         1
##  4   867 female    11     1 Never S…     2             1         1         1
##  5   910 female    11     1 Always …     2             1         1         1
##  6  2071 female    11     1 Always …     3             1         1         2
##  7  9152 male      11     1 Sometim…     1             2         2         2
##  8   907 male      12     1 Sometim…     2             1         1         1
##  9  1345 female    12     1 Always …     2             2         2         2
## 10  2222 female    12     1 Always …     1             1         1         1
## # … with 40 more rows, and 44 more variables: home_room <dbl>,
## #   home_internet <dbl>, computer_home <dbl>, computer_school <dbl>,
## #   computer_some <dbl>, parentsupport1 <dbl>, parentsupport2 <dbl>,
## #   parentsupport3 <dbl>, parentsupport4 <dbl>, school1 <dbl>, school2 <dbl>,
## #   school3 <dbl>, studentbullied1 <dbl>, studentbullied2 <dbl>,
## #   studentbullied3 <dbl>, studentbullied4 <dbl>, studentbullied5 <dbl>,
## #   studentbullied6 <dbl>, learning1 <dbl>, learning2 <dbl>, learning3 <dbl>,
## #   learning4 <dbl>, learning5 <dbl>, learning6 <dbl>, learning7 <dbl>,
## #   engagement1 <dbl>, engagement2 <dbl>, engagement3 <dbl>, engagement4 <dbl>,
## #   engagement5 <dbl>, confidence1 <dbl>, confidence2 <dbl>, confidence3 <dbl>,
## #   confidence4 <dbl>, confidence5 <dbl>, confidence6 <dbl>, score1 <dbl>,
## #   score2 <dbl>, score3 <dbl>, score4 <dbl>, score5 <dbl>, ScienceTotal <dbl>,
## #   ParentSupport <dbl>, StudentsBullied <dbl>

3.8.2 Merging file: Add Cases

# Import the cases first
library(haven) # For import the SPSS files
Data_add_cases <- read_sav("Data_add cases.sav")
# In R, each observation is a row, and each variable is a colcumn.
# The previous operation has already changed the variables number of our test data. And that's why we need made a copy of the original data. Now we need copy a new test data for this operation.
test_data2 <- asgusam5
# We can use the rbind() function to add the cases
added_test_data <- rbind(test_data2,Data_add_cases)
describe(added_test_data) # Now we have 13069 observations instead of 12569 observations,right?
##                 vars     n    mean      sd  median trimmed     mad    min
## id                 1 13069 6628.01 3952.32 6535.00 6535.00 4843.65   1.00
## gender             2 13052    1.50    0.50    1.00    1.50    0.00   1.00
## month              3 13052    6.57    3.42    7.00    6.58    4.45   1.00
## year               4 13052    3.62    0.53    4.00    3.67    0.00   1.00
## language           5 13033    1.36    1.26    1.00    1.09    0.00   1.00
## book               6 13033    2.93    1.33    3.00    2.85    1.48   1.00
## home_computer      7 13040    1.13    0.75    1.00    1.00    0.00   1.00
## home_desk          8 13038    1.32    0.89    1.00    1.18    0.00   1.00
## home_book          9 13035    1.17    0.85    1.00    1.00    0.00   1.00
## home_room         10 13037    1.35    0.87    1.00    1.24    0.00   1.00
## home_internet     11 13036    1.25    0.97    1.00    1.08    0.00   1.00
## computer_home     12 13031    1.99    1.46    2.00    1.72    1.48   1.00
## computer_school   13 13008    2.58    1.83    2.00    2.22    1.48   1.00
## computer_some     14 13007    3.25    1.82    3.00    3.05    1.48   1.00
## parentsupport1    15 13027    1.79    1.38    1.00    1.50    0.00   1.00
## parentsupport2    16 13024    2.12    1.57    2.00    1.86    1.48   1.00
## parentsupport3    17 13022    1.65    1.55    1.00    1.27    0.00   1.00
## parentsupport4    18 13026    1.64    1.47    1.00    1.28    0.00   1.00
## school1           19 13034    2.00    1.25    2.00    1.80    1.48   1.00
## school2           20 13032    1.74    1.34    1.00    1.45    0.00   1.00
## school3           21 13027    1.85    1.45    1.00    1.56    0.00   1.00
## studentbullied1   22 13034    3.08    1.38    3.00    3.13    1.48   1.00
## studentbullied2   23 13032    3.28    1.37    4.00    3.35    0.00   1.00
## studentbullied3   24 13023    3.32    1.42    4.00    3.36    0.00   1.00
## studentbullied4   25 13027    3.47    1.40    4.00    3.52    0.00   1.00
## studentbullied5   26 13022    3.41    1.36    4.00    3.48    0.00   1.00
## studentbullied6   27 13029    3.66    1.17    4.00    3.80    0.00   1.00
## learning1         28 13028    1.72    1.28    1.00    1.45    0.00   1.00
## learning2         29 13022    3.29    1.38    4.00    3.34    0.00   1.00
## learning3         30 13017    2.99    1.53    3.00    2.93    1.48   1.00
## learning4         31 13016    3.45    1.43    4.00    3.49    0.00   1.00
## learning5         32 13019    1.61    1.46    1.00    1.25    0.00   1.00
## learning6         33 13019    1.85    1.54    1.00    1.51    0.00   1.00
## learning7         34 13022    1.47    1.31    1.00    1.15    0.00   1.00
## engagement1       35 13026    1.45    1.21    1.00    1.18    0.00   1.00
## engagement2       36 13019    3.08    1.45    3.00    3.08    1.48   1.00
## engagement3       37 13018    1.69    1.48    1.00    1.34    0.00   1.00
## engagement4       38 13014    1.72    1.46    1.00    1.40    0.00   1.00
## engagement5       39 13021    1.66    1.39    1.00    1.34    0.00   1.00
## confidence1       40 13025    1.73    1.28    1.00    1.47    0.00   1.00
## confidence2       41 13020    3.30    1.38    4.00    3.34    0.00   1.00
## confidence3       42 13015    3.49    1.45    4.00    3.50    0.00   1.00
## confidence4       43 13011    2.08    1.63    2.00    1.74    1.48   1.00
## confidence5       44 13015    2.31    1.59    2.00    2.06    1.48   1.00
## confidence6       45 13019    3.39    1.38    4.00    3.45    0.00   1.00
## score1            46 13069  542.79   78.84  547.35  545.10   77.96 241.63
## score2            47 13069  540.85   79.09  546.05  543.28   78.23 210.85
## score3            48 13069  540.86   79.51  545.86  543.39   78.17 184.31
## score4            49 13069  540.30   79.60  545.11  542.43   78.76 214.59
## score5            50 13069  542.05   78.61  546.65  544.32   78.48 228.65
##                      max    range  skew kurtosis    se
## id              15500.00 15499.00  0.19    -0.87 34.57
## gender              2.00     1.00  0.01    -2.00  0.00
## month              12.00    11.00 -0.04    -1.20  0.03
## year                5.00     4.00 -0.88    -0.07  0.00
## language            9.00     8.00  5.30    28.93  0.01
## book                9.00     8.00  1.02     3.00  0.01
## home_computer       9.00     8.00  9.36    94.44  0.01
## home_desk           9.00     8.00  6.68    54.03  0.01
## home_book           9.00     8.00  8.18    71.79  0.01
## home_room           9.00     8.00  6.48    53.74  0.01
## home_internet       9.00     8.00  6.88    51.57  0.01
## computer_home       9.00     8.00  2.79    10.19  0.01
## computer_school     9.00     8.00  2.49     6.21  0.02
## computer_some       9.00     8.00  1.63     3.54  0.02
## parentsupport1      9.00     8.00  2.90    11.09  0.01
## parentsupport2      9.00     8.00  2.36     7.28  0.01
## parentsupport3      9.00     8.00  3.23    11.47  0.01
## parentsupport4      9.00     8.00  3.22    11.97  0.01
## school1             9.00     8.00  2.60    10.98  0.01
## school2             9.00     8.00  3.38    14.57  0.01
## school3             9.00     8.00  2.85    10.47  0.01
## studentbullied1     9.00     8.00  0.67     3.52  0.01
## studentbullied2     9.00     8.00  0.84     4.47  0.01
## studentbullied3     9.00     8.00  1.02     4.70  0.01
## studentbullied4     9.00     8.00  1.22     5.54  0.01
## studentbullied5     9.00     8.00  0.97     5.20  0.01
## studentbullied6     9.00     8.00  0.87     7.81  0.01
## learning1           9.00     8.00  3.15    13.61  0.01
## learning2           9.00     8.00  1.00     4.71  0.01
## learning3           9.00     8.00  1.42     4.50  0.01
## learning4           9.00     8.00  1.28     5.24  0.01
## learning5           9.00     8.00  3.71    15.19  0.01
## learning6           9.00     8.00  2.97    10.42  0.01
## learning7           9.00     8.00  4.43    21.62  0.01
## engagement1         9.00     8.00  4.70    25.28  0.01
## engagement2         9.00     8.00  1.24     4.46  0.01
## engagement3         9.00     8.00  3.61    14.47  0.01
## engagement4         9.00     8.00  3.50    14.06  0.01
## engagement5         9.00     8.00  3.60    15.29  0.01
## confidence1         9.00     8.00  3.64    17.03  0.01
## confidence2         9.00     8.00  1.27     5.40  0.01
## confidence3         9.00     8.00  1.40     5.35  0.01
## confidence4         9.00     8.00  2.79     9.04  0.01
## confidence5         9.00     8.00  2.37     7.42  0.01
## confidence6         9.00     8.00  1.05     5.06  0.01
## score1            806.35   564.72 -0.29     0.02  0.69
## score2            780.93   570.08 -0.31     0.05  0.69
## score3            779.10   594.79 -0.31     0.03  0.70
## score4            796.09   581.50 -0.26     0.04  0.70
## score5            795.12   566.47 -0.28     0.01  0.69

3.8.3 Merging file: Add variable

# We can use the cbind() function to add the variable
# Import the variables first
Data_add_variables <- read_sav("Data_add_variables.sav")
variable_added_data <- cbind(test_data,Data_add_variables)
# Check the new dataset
str(variable_added_data) # Now we have 60 variables instead of 53, right? Good job!
## 'data.frame':	12569 obs. of  60 variables:
##  $ id             : num  1 2 3 4 5 6 7 8 9 10 ...
##  $ gender         : Factor w/ 2 levels "male","female": 1 2 2 1 2 2 1 1 2 1 ...
##  $ month          : num  1 9 10 8 8 11 1 11 8 6 ...
##  $ year           : num  5 4 4 4 4 4 4 4 4 4 ...
##  $ language       : Factor w/ 3 levels "Always Speak",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ book           : num  4 3 4 3 5 3 2 3 4 2 ...
##  $ home_computer  : num  1 1 1 1 1 1 1 1 1 2 ...
##  $ home_desk      : num  2 1 1 1 1 1 1 1 1 1 ...
##  $ home_book      : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ home_room      : num  1 2 1 1 1 2 2 2 1 2 ...
##  $ home_internet  : num  1 1 1 1 1 1 1 1 1 2 ...
##  $ computer_home  : num  1 1 1 2 1 2 1 2 4 4 ...
##  $ computer_school: num  4 4 2 2 3 3 3 2 4 4 ...
##  $ computer_some  : num  4 4 3 1 4 4 1 4 4 2 ...
##  $ parentsupport1 : num  1 1 4 1 2 2 1 1 4 4 ...
##  $ parentsupport2 : num  2 1 1 1 1 2 4 1 3 4 ...
##  $ parentsupport3 : num  1 1 2 1 4 1 1 1 1 3 ...
##  $ parentsupport4 : num  1 1 1 1 2 1 2 1 1 2 ...
##  $ school1        : num  3 1 2 1 3 2 4 1 3 4 ...
##  $ school2        : num  3 4 1 1 2 1 2 2 1 3 ...
##  $ school3        : num  2 4 2 1 2 1 2 2 1 2 ...
##  $ studentbullied1: num  3 1 2 1 1 2 4 2 2 4 ...
##  $ studentbullied2: num  3 1 2 2 1 3 4 3 4 4 ...
##  $ studentbullied3: num  3 1 4 3 2 3 4 2 4 4 ...
##  $ studentbullied4: num  4 1 4 4 1 2 4 2 4 4 ...
##  $ studentbullied5: num  3 1 2 3 1 3 4 1 3 4 ...
##  $ studentbullied6: num  4 3 4 4 2 4 4 1 4 4 ...
##  $ learning1      : num  3 1 1 1 1 2 4 1 1 4 ...
##  $ learning2      : num  1 1 1 1 2 2 4 1 1 4 ...
##  $ learning3      : num  3 1 1 2 2 2 4 2 4 4 ...
##  $ learning4      : num  1 1 1 1 1 2 3 1 1 4 ...
##  $ learning5      : num  1 1 1 1 2 2 2 1 1 3 ...
##  $ learning6      : num  1 1 1 1 1 2 4 1 1 4 ...
##  $ learning7      : num  1 1 1 1 1 2 1 1 1 3 ...
##  $ engagement1    : num  1 1 1 1 2 1 1 1 1 4 ...
##  $ engagement2    : num  1 4 2 3 3 2 1 2 1 1 ...
##  $ engagement3    : num  3 2 1 1 3 2 1 2 1 4 ...
##  $ engagement4    : num  2 1 1 1 2 2 4 1 1 4 ...
##  $ engagement5    : num  1 1 1 1 2 1 4 1 1 4 ...
##  $ confidence1    : num  3 1 1 1 1 2 2 2 1 4 ...
##  $ confidence2    : num  1 4 4 4 3 3 4 4 4 1 ...
##  $ confidence3    : num  1 4 4 4 3 4 4 4 4 1 ...
##  $ confidence4    : num  4 1 1 1 3 2 2 2 1 3 ...
##  $ confidence5    : num  3 1 1 1 4 2 3 1 1 4 ...
##  $ confidence6    : num  1 4 4 4 4 4 4 4 4 1 ...
##  $ score1         : num  492 517 656 550 642 ...
##  $ score2         : num  487 576 603 567 644 ...
##  $ score3         : num  463 536 627 575 673 ...
##  $ score4         : num  455 537 574 544 645 ...
##  $ score5         : num  476 513 633 609 637 ...
##  $ ScienceTotal   : num  475 536 619 569 648 ...
##  $ ParentSupport  : num  1.25 1 2 1 2.25 1.5 2 1 2.25 3.25 ...
##  $ StudentsBullied: num  20 8 18 17 8 17 24 11 21 24 ...
##  $ id             : num  1 2 3 4 5 6 7 8 9 10 ...
##   ..- attr(*, "format.spss")= chr "F12.0"
##   ..- attr(*, "display_width")= int 12
##  $ IDCNTRY        : num  840 840 840 840 840 840 840 840 840 840 ...
##   ..- attr(*, "label")= chr "*COUNTRY ID*"
##   ..- attr(*, "format.spss")= chr "F5.0"
##  $ IDBOOK         : num  2 3 4 5 6 7 9 10 11 12 ...
##   ..- attr(*, "label")= chr "*ACHIEVEMENT TEST BOOKLET*"
##   ..- attr(*, "format.spss")= chr "F2.0"
##  $ IDSCHOOL       : num  1 1 1 1 1 1 1 1 1 1 ...
##   ..- attr(*, "label")= chr "*SCHOOL ID*"
##   ..- attr(*, "format.spss")= chr "F4.0"
##  $ IDCLASS        : num  102 102 102 102 102 102 102 102 102 102 ...
##   ..- attr(*, "label")= chr "*CLASS ID*"
##   ..- attr(*, "format.spss")= chr "F6.0"
##  $ IDSTUD         : num  10201 10202 10203 10204 10205 ...
##   ..- attr(*, "label")= chr "*STUDENT ID*"
##   ..- attr(*, "format.spss")= chr "F8.0"
##  $ IDGRADE        : dbl+lbl [1:12569] 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4...
##    ..@ label      : chr "*GRADE ID*"
##    ..@ format.spss: chr "F2.0"
##    ..@ labels     : Named num  3 4 5 6 99
##    .. ..- attr(*, "names")= chr  "GRADE 3" "GRADE 4" "GRADE 5" "GRADE 6" ...

3.9 Class Exercise

1.Add all cases into the ‘asgusam5.sav’ file from ‘Data_add cases(exercise).sav’ file

2.Add all variables from ‘Data_add variables(exercise).sav’ file into the ‘asgusam5.sav’ file.

Note* Sort the Key variable (e.g.‘id’ variable from both datasets) first before merging the datasets.