5.7 Variable Selection and Cleaning
Now, we need to extract the following information from the data. You can verify this code assigned to each variable from technical report in the previous section.
- ICT Interest (INTICT) was measured by 6 items:
IC013Q01NA
,IC013Q04NA
,IC013Q05NA
,IC013Q11NA
,IC013Q12NA
,IC013Q13NA
- Perceived ICT Competence (COMPICT) was measured by 5 items:
IC014Q03NA
,IC014Q04NA
,IC014Q06NA
,IC014Q08NA
,IC014Q09NA
- Perceived Autonomy related to ICT Use (AUTICT) was measured by 5 items:
IC015Q02NA
,IC015Q03NA
,IC015Q05NA
,C015Q07NA
,C015Q09NA
- ICT as a topic in Social Interaction (SOIAICT) was measured with 5 items:
IC016Q01NA
,IC016Q02NA
,IC016Q04NA
,IC016Q05NA
,IC016Q07NA
Note: All latent variables were measured on a four point likert scale.
In the code below, we select only the variables for the Chile sample.
# Create a vector with the code of each variable as shown in the data set
var <- c("IC013Q01NA", "IC013Q04NA", "IC013Q05NA", "IC013Q11NA", "IC013Q12NA", "IC013Q13NA", # INTICT
"IC014Q03NA", "IC014Q04NA", "IC014Q06NA", "IC014Q08NA", "IC014Q09NA", # COMPICT
"IC015Q02NA","IC015Q03NA","IC015Q05NA" ,"IC015Q07NA" ,"IC015Q09NA", # AUICT
"IC016Q01NA","IC016Q02NA","IC016Q04NA","IC016Q05NA","IC016Q07NA" # SOIAICT
)
# Select only US sample using the `CNTRYID`
data.us.only <- data[data$CNTRYID == 152, var]
# Preview the first six columns and rows
head(data.us.only)
## IC013Q01NA IC013Q04NA IC013Q05NA IC013Q11NA IC013Q12NA IC013Q13NA IC014Q03NA
## 91050 NA NA NA NA NA NA NA
## 91051 3 3 4 4 3 3 NA
## 91052 3 4 4 4 4 4 3
## 91053 2 3 3 3 2 3 3
## 91054 4 4 4 4 2 4 4
## 91055 3 2 3 3 4 4 4
## IC014Q04NA IC014Q06NA IC014Q08NA IC014Q09NA IC015Q02NA IC015Q03NA IC015Q05NA
## 91050 NA NA NA NA NA NA NA
## 91051 NA NA NA NA 4 3 3
## 91052 3 3 3 3 2 2 3
## 91053 3 3 3 NA 3 3 3
## 91054 4 4 4 4 4 4 4
## 91055 3 4 3 3 2 2 3
## IC015Q07NA IC015Q09NA IC016Q01NA IC016Q02NA IC016Q04NA IC016Q05NA IC016Q07NA
## 91050 NA NA NA NA NA NA NA
## 91051 3 4 3 2 3 4 3
## 91052 2 2 3 3 2 2 2
## 91053 3 3 3 3 3 3 3
## 91054 4 4 4 4 4 4 4
## 91055 3 3 3 2 2 3 3
It is alway a good idea to rename the columns for easy identification. We can use the below code to achieve that purpose.
# Create a vector with new names for each column. Make sure it correspond to the vector `var` defined above
cols <- c("INTICT1", "INTICT2", "INTICT3", "INTICT4", "INTICT5", "INTICT6", # INTICT
"COMPICT1", "COMPICT2", "COMPICT3", "COMPICT4", "COMPICT5", # COMPICT
"AUICT1","AUICT2","AUICT3" ,"AUICT4" ,"AUICT5", # AUICT
"SOIAICT1","SOIAICT2","SOIAICT3","SOIAICT4","SOIAICT5" # SOIAICT
)
# Rename the columns
colnames(data.us.only) <- cols
# Preview Data
head(data.us.only)
## INTICT1 INTICT2 INTICT3 INTICT4 INTICT5 INTICT6 COMPICT1 COMPICT2 COMPICT3
## 91050 NA NA NA NA NA NA NA NA NA
## 91051 3 3 4 4 3 3 NA NA NA
## 91052 3 4 4 4 4 4 3 3 3
## 91053 2 3 3 3 2 3 3 3 3
## 91054 4 4 4 4 2 4 4 4 4
## 91055 3 2 3 3 4 4 4 3 4
## COMPICT4 COMPICT5 AUICT1 AUICT2 AUICT3 AUICT4 AUICT5 SOIAICT1 SOIAICT2 SOIAICT3
## 91050 NA NA NA NA NA NA NA NA NA NA
## 91051 NA NA 4 3 3 3 4 3 2 3
## 91052 3 3 2 2 3 2 2 3 3 2
## 91053 3 NA 3 3 3 3 3 3 3 3
## 91054 4 4 4 4 4 4 4 4 4 4
## 91055 3 3 2 2 3 3 3 3 2 2
## SOIAICT4 SOIAICT5
## 91050 NA NA
## 91051 4 3
## 91052 2 2
## 91053 3 3
## 91054 4 4
## 91055 3 3
## INTICT1 INTICT2 INTICT3 INTICT4 INTICT5
## Min. :1.000 Min. :1.00 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.:2.000 1st Qu.:3.00 1st Qu.:3.000 1st Qu.:2.000 1st Qu.:2.000
## Median :3.000 Median :3.00 Median :3.000 Median :3.000 Median :2.000
## Mean :2.753 Mean :3.28 Mean :3.235 Mean :2.854 Mean :2.422
## 3rd Qu.:3.000 3rd Qu.:4.00 3rd Qu.:4.000 3rd Qu.:3.000 3rd Qu.:3.000
## Max. :4.000 Max. :4.00 Max. :4.000 Max. :4.000 Max. :4.000
## NA's :743 NA's :783 NA's :807 NA's :820 NA's :775
## INTICT6 COMPICT1 COMPICT2 COMPICT3 COMPICT4
## Min. :1.00 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.:3.00 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:3.000 1st Qu.:3.000
## Median :3.00 Median :3.000 Median :3.000 Median :3.000 Median :3.000
## Mean :3.25 Mean :2.837 Mean :2.906 Mean :3.288 Mean :2.975
## 3rd Qu.:4.00 3rd Qu.:3.000 3rd Qu.:3.000 3rd Qu.:4.000 3rd Qu.:3.000
## Max. :4.00 Max. :4.000 Max. :4.000 Max. :4.000 Max. :4.000
## NA's :799 NA's :834 NA's :853 NA's :880 NA's :869
## COMPICT5 AUICT1 AUICT2 AUICT3 AUICT4
## Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.0 Min. :1.000
## 1st Qu.:3.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:3.0 1st Qu.:2.000
## Median :3.000 Median :2.000 Median :3.000 Median :3.0 Median :3.000
## Mean :2.966 Mean :2.471 Mean :2.501 Mean :3.1 Mean :2.855
## 3rd Qu.:3.000 3rd Qu.:3.000 3rd Qu.:3.000 3rd Qu.:4.0 3rd Qu.:3.000
## Max. :4.000 Max. :4.000 Max. :4.000 Max. :4.0 Max. :4.000
## NA's :876 NA's :853 NA's :880 NA's :890 NA's :907
## AUICT5 SOIAICT1 SOIAICT2 SOIAICT3 SOIAICT4
## Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.:3.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000
## Median :3.000 Median :3.000 Median :3.000 Median :3.000 Median :3.000
## Mean :3.143 Mean :2.772 Mean :2.506 Mean :2.554 Mean :2.529
## 3rd Qu.:4.000 3rd Qu.:3.000 3rd Qu.:3.000 3rd Qu.:3.000 3rd Qu.:3.000
## Max. :4.000 Max. :4.000 Max. :4.000 Max. :4.000 Max. :4.000
## NA's :915 NA's :930 NA's :967 NA's :963 NA's :977
## SOIAICT5
## Min. :1.000
## 1st Qu.:2.000
## Median :3.000
## Mean :2.612
## 3rd Qu.:3.000
## Max. :4.000
## NA's :986
You should notice by the summary statistics that our data contains several missing values denoted as NA's
. For example, INTICT1
and INTICT2
contain 743 and 783 missing responses. The subject of missing response is complex and we won’t go into it. Instead, we will remove all participants with missing responses from our study.
# Remove participants with NAs
data.us.only <- subset(data.us.only, complete.cases(data.us.only))
summary(data.us.only)
## INTICT1 INTICT2 INTICT3 INTICT4 INTICT5
## Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.:2.000 1st Qu.:3.000 1st Qu.:3.000 1st Qu.:2.000 1st Qu.:2.000
## Median :3.000 Median :3.000 Median :3.000 Median :3.000 Median :2.000
## Mean :2.768 Mean :3.304 Mean :3.247 Mean :2.872 Mean :2.422
## 3rd Qu.:3.000 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:3.000 3rd Qu.:3.000
## Max. :4.000 Max. :4.000 Max. :4.000 Max. :4.000 Max. :4.000
## INTICT6 COMPICT1 COMPICT2 COMPICT3 COMPICT4
## Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.:3.000 1st Qu.:2.000 1st Qu.:3.000 1st Qu.:3.000 1st Qu.:3.000
## Median :3.000 Median :3.000 Median :3.000 Median :3.000 Median :3.000
## Mean :3.266 Mean :2.848 Mean :2.922 Mean :3.309 Mean :2.987
## 3rd Qu.:4.000 3rd Qu.:3.000 3rd Qu.:3.000 3rd Qu.:4.000 3rd Qu.:3.000
## Max. :4.000 Max. :4.000 Max. :4.000 Max. :4.000 Max. :4.000
## COMPICT5 AUICT1 AUICT2 AUICT3 AUICT4
## Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.:3.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:3.000 1st Qu.:2.000
## Median :3.000 Median :3.000 Median :3.000 Median :3.000 Median :3.000
## Mean :2.975 Mean :2.491 Mean :2.516 Mean :3.121 Mean :2.871
## 3rd Qu.:3.000 3rd Qu.:3.000 3rd Qu.:3.000 3rd Qu.:4.000 3rd Qu.:3.000
## Max. :4.000 Max. :4.000 Max. :4.000 Max. :4.000 Max. :4.000
## AUICT5 SOIAICT1 SOIAICT2 SOIAICT3 SOIAICT4
## Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.:3.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000
## Median :3.000 Median :3.000 Median :3.000 Median :3.000 Median :3.000
## Mean :3.161 Mean :2.786 Mean :2.507 Mean :2.555 Mean :2.527
## 3rd Qu.:4.000 3rd Qu.:3.000 3rd Qu.:3.000 3rd Qu.:3.000 3rd Qu.:3.000
## Max. :4.000 Max. :4.000 Max. :4.000 Max. :4.000 Max. :4.000
## SOIAICT5
## Min. :1.000
## 1st Qu.:2.000
## Median :3.000
## Mean :2.615
## 3rd Qu.:3.000
## Max. :4.000
Because the original data set is large and takes longer time to load, we can export the data for US sample in a .csv
format for further analysis.