Hi Pei-Hsuan!
I’m including some notes in this document so you can keep track of what is happening in it. It might be useful to try to reproduce this in a separate file, but mostly I want you to be able to explain your analysis and answer questions about it. Look for the text beside the lines of code that explains the steps (some explanations will be in paragraph form like this).
library(tidyverse) #call in the library for loading data
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.5
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.1 ✔ stringr 1.4.0
## ✔ readr 2.1.2 ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
data <- read_csv('youtube.csv') #load your data
## Rows: 149 Columns: 45
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (43): PE_1, PE_2, PE_3, PE_4, EE_1, EE_2, EE_3, EE_4, SI_1, SI_2, SI_3, ...
## lgl (2): Q13_4_TEXT, Q17_4_TEXT
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
#data <- data %>% drop_na(BI_1) #we can use this line to drop the missing data cases before analysis if we need to later on, right now it's not important to do this.
Each line of code below is creating a new data frame or spreadsheet that only includes data from that particular subscale. In the second chunk below, reliability estimates (Cronbach’s Alpha) are calculated. Everything has acceptable reliability except for TK and SI. Neither subscale improves in reliability if items are removed. However, these are approaching acceptable reliability and therefore you should continue to proceed with the analysis. Facilitation was less reliable than others, but this may not have a place in the model given the good fit and replication you’ve achieved.
TKRel <- data.frame(data[c(grepl("TK", colnames(data)))])
TCKRel <- data.frame(data[c(grepl("TCK", colnames(data)))])
TPKRel <- data.frame(data[c(grepl("TPK", colnames(data)))])
TPCKRel <- data.frame(data[c(grepl("TPCK", colnames(data)))])
PERel <- data.frame(data[c(grepl("PE", colnames(data)))])
EERel <- data.frame(data[c(grepl("EE", colnames(data)))])
SIRel <- data.frame(data[c(grepl("SI", colnames(data)))])
HMRel <- data.frame(data[c(grepl("HM", colnames(data)))])
BIRel <- data.frame(data[c(grepl("BI", colnames(data)))])
HARel <- data.frame(data[c(grepl("Ha", colnames(data)))])
FacilitationRel <- data.frame(data[c(grepl("FC", colnames(data)))])
library(psych)
psych::alpha(TKRel) #.67
#psych::alpha(TCKRel) #ONE ITEM NO REL
psych::alpha(TPKRel) #.77
psych::alpha(TPCKRel) #.79
psych::alpha(PERel) # .82
psych::alpha(EERel) # .77
psych::alpha(SIRel) # .66
psych::alpha(HMRel) # .96
psych::alpha(BIRel) # .87
psych::alpha(HARel) # .79
psych::alpha(FacilitationRel) # .62
#other outcomes are not listed as they have only one item each
The code below is creating composite variables for each subpart within the three bigger constructs in the model. The total score on these items becomes continuous given the addition of oridnal Likert-type data, and then you will use those continuous composite (total score) variables in the model.
data$TK <- data$TK_1 + data$TK_2 + data$TK_3 + data$TK_4
data$TCK <- data$TCK_1
data$TPK <- data$TPK_1 + data$TPK_2 + data$TPK_3 + data$TPK_4
data$TPCK <- data$TPCK_1 + data$TPCK_2 + data$TPCK_3 + data$TPCK_4
data$PE <- data$PE_1 + data$PE_2 + data$PE_3 + data$PE_4
data$EE <- data$EE_1 + data$EE_2 + data$EE_3 + data$EE_4
data$SI <- data$SI_1 + data$SI_2 + data$SI_3 + data$SI_4
data$HM <- data$HM_1 + data$HM_2 + data$HM_3
data$BI <- data$BI_1 + data$BI_2 + data$BI_3
data$HA <- data$Ha_1 + data$Ha_2 + data$Ha_3
data$Facilitation <- data$FC_1 + data$FC_2 + data$FC_3 + data$FC_4
Below is the first model that includes all hypothesized paths. The model fit is good (non significant chi-square, RMSEA less than .10, and CFI more than .9). However, in the regression section of the summary, you can see that Tech Competency is nonsignificant (p > .05). As such we need to “prune” this path from the model by deleting it from the model syntax which you will see in the second chunk below.
library(lavaan)
## This is lavaan 0.6-11
## lavaan is FREE software! Please report any bugs.
library(semPlot)
model1 <- '
Beliefs =~ PE + EE + SI + HM
Tech Competency =~ TK + TCK + TPK + TPCK
Behavior =~ BI + HA
Behavior ~ Beliefs + Tech Competency
Beliefs ~ Tech Competency
'
fit <- sem(model1, missing = 'ML', data = data)
summary(fit, fit.measures = TRUE, standardized=TRUE)
## lavaan 0.6-11 ended normally after 78 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 33
##
## Used Total
## Number of observations 113 149
## Number of missing patterns 6
##
## Model Test User Model:
##
## Test statistic 31.604
## Degrees of freedom 32
## P-value (Chi-square) 0.487
##
## Model Test Baseline Model:
##
## Test statistic 414.729
## Degrees of freedom 45
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 1.000
## Tucker-Lewis Index (TLI) 1.002
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -1784.520
## Loglikelihood unrestricted model (H1) -1768.718
##
## Akaike (AIC) 3635.040
## Bayesian (BIC) 3725.044
## Sample-size adjusted Bayesian (BIC) 3620.747
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.000
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.068
## P-value RMSEA <= 0.05 0.827
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.048
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Observed
## Observed information based on Hessian
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## Beliefs =~
## PE 1.000 1.993 0.637
## EE 0.728 0.122 5.975 0.000 1.452 0.714
## SI 0.616 0.135 4.573 0.000 1.229 0.522
## HM 0.863 0.144 6.000 0.000 1.721 0.711
## TechCompetency =~
## TK 1.000 1.882 0.705
## TCK 0.148 0.033 4.454 0.000 0.278 0.531
## TPK 0.934 0.137 6.818 0.000 1.759 0.834
## TPCK 1.144 0.158 7.245 0.000 2.153 0.891
## Behavior =~
## BI 1.000 1.749 0.917
## HA 1.030 0.173 5.943 0.000 1.801 0.604
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## Behavior ~
## Beliefs 1.028 0.335 3.065 0.002 1.171 1.171
## TechCompetency -0.211 0.306 -0.687 0.492 -0.226 -0.226
## Beliefs ~
## TechCompetency 0.911 0.174 5.224 0.000 0.860 0.860
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .PE 16.310 0.294 55.414 0.000 16.310 5.213
## .EE 17.897 0.205 87.408 0.000 17.897 8.798
## .SI 12.856 0.239 53.894 0.000 12.856 5.463
## .HM 12.880 0.244 52.689 0.000 12.880 5.318
## .TK 14.950 0.284 52.711 0.000 14.950 5.603
## .TCK 4.319 0.056 76.821 0.000 4.319 8.266
## .TPK 17.402 0.222 78.311 0.000 17.402 8.254
## .TPCK 16.947 0.253 66.954 0.000 16.947 7.011
## .BI 13.202 0.195 67.531 0.000 13.202 6.922
## .HA 9.299 0.302 30.834 0.000 9.299 3.119
## .Beliefs 0.000 0.000 0.000
## TechCompetency 0.000 0.000 0.000
## .Behavior 0.000 0.000 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .PE 5.815 0.912 6.379 0.000 5.815 0.594
## .EE 2.030 0.353 5.746 0.000 2.030 0.491
## .SI 4.027 0.608 6.626 0.000 4.027 0.727
## .HM 2.904 0.492 5.908 0.000 2.904 0.495
## .TK 3.578 0.626 5.713 0.000 3.578 0.503
## .TCK 0.196 0.032 6.129 0.000 0.196 0.718
## .TPK 1.352 0.300 4.500 0.000 1.352 0.304
## .TPCK 1.206 0.352 3.429 0.001 1.206 0.206
## .BI 0.577 0.339 1.700 0.089 0.577 0.159
## .HA 5.643 0.885 6.373 0.000 5.643 0.635
## .Beliefs 1.035 0.505 2.048 0.041 0.260 0.260
## TechCompetency 3.541 1.009 3.510 0.000 1.000 1.000
## .Behavior 0.103 0.443 0.232 0.817 0.034 0.034
Model 2 does not have tech competency predicting behavior given the nonsignficant regression path that I mentioned above. I was, however, concerned with one of the few assumptions of SEM being “no multicollinearity” which essentially means that factors (like beliefs and behavior) should not be correlated above r = .8 so that you can be sure you are actually measuring two different things (and not the same thing). To assess this, I added a correlation between the factors to the model (Beliefs ~~ Behaviors). In the covariance section of the summary, this was nonsignificant, so you can move forward knowing that your model has met all assumptions, including the no multicollinearity assumption. I updated your final model to not include this correlation as it was nonsignificant.
model2 <- '
Beliefs =~ PE + EE + SI + HM
Tech Competency =~ TK + TCK + TPK + TPCK
Behavior =~ BI + HA
Behavior ~ Beliefs
Beliefs ~ Tech Competency
Beliefs ~~ Behavior
'
fit <- sem(model2, missing = 'ML', data = data)
summary(fit, fit.measures = TRUE, standardized=TRUE)
## lavaan 0.6-11 ended normally after 75 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 33
##
## Used Total
## Number of observations 113 149
## Number of missing patterns 6
##
## Model Test User Model:
##
## Test statistic 31.604
## Degrees of freedom 32
## P-value (Chi-square) 0.487
##
## Model Test Baseline Model:
##
## Test statistic 414.729
## Degrees of freedom 45
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 1.000
## Tucker-Lewis Index (TLI) 1.002
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -1784.520
## Loglikelihood unrestricted model (H1) -1768.718
##
## Akaike (AIC) 3635.040
## Bayesian (BIC) 3725.044
## Sample-size adjusted Bayesian (BIC) 3620.747
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.000
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.068
## P-value RMSEA <= 0.05 0.827
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.048
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Observed
## Observed information based on Hessian
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## Beliefs =~
## PE 1.000 1.993 0.637
## EE 0.728 0.122 5.975 0.000 1.452 0.714
## SI 0.616 0.135 4.573 0.000 1.229 0.522
## HM 0.863 0.144 6.000 0.000 1.721 0.711
## TechCompetency =~
## TK 1.000 1.882 0.705
## TCK 0.148 0.033 4.454 0.000 0.278 0.531
## TPK 0.934 0.137 6.818 0.000 1.759 0.834
## TPCK 1.144 0.158 7.245 0.000 2.153 0.891
## Behavior =~
## BI 1.000 1.749 0.917
## HA 1.030 0.173 5.943 0.000 1.801 0.604
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## Behavior ~
## Beliefs 0.797 0.122 6.542 0.000 0.908 0.908
## Beliefs ~
## TechCompetency 0.911 0.174 5.224 0.000 0.860 0.860
##
## Covariances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .Beliefs ~~
## .Behavior 0.239 0.276 0.867 0.386 0.592 0.592
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .PE 16.310 0.294 55.414 0.000 16.310 5.213
## .EE 17.897 0.205 87.408 0.000 17.897 8.798
## .SI 12.856 0.239 53.894 0.000 12.856 5.463
## .HM 12.880 0.244 52.689 0.000 12.880 5.318
## .TK 14.950 0.284 52.711 0.000 14.950 5.603
## .TCK 4.319 0.056 76.821 0.000 4.319 8.266
## .TPK 17.402 0.222 78.311 0.000 17.402 8.254
## .TPCK 16.947 0.253 66.954 0.000 16.947 7.011
## .BI 13.202 0.195 67.531 0.000 13.202 6.922
## .HA 9.299 0.302 30.834 0.000 9.299 3.119
## .Beliefs 0.000 0.000 0.000
## TechCompetency 0.000 0.000 0.000
## .Behavior 0.000 0.000 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .PE 5.815 0.912 6.379 0.000 5.815 0.594
## .EE 2.030 0.353 5.746 0.000 2.030 0.491
## .SI 4.027 0.608 6.626 0.000 4.027 0.727
## .HM 2.904 0.492 5.908 0.000 2.904 0.495
## .TK 3.578 0.626 5.713 0.000 3.578 0.503
## .TCK 0.196 0.032 6.129 0.000 0.196 0.718
## .TPK 1.352 0.300 4.500 0.000 1.352 0.304
## .TPCK 1.206 0.352 3.429 0.001 1.206 0.206
## .BI 0.577 0.339 1.700 0.089 0.577 0.159
## .HA 5.643 0.885 6.373 0.000 5.643 0.635
## .Beliefs 1.035 0.505 2.048 0.041 0.260 0.260
## TechCompetency 3.541 1.009 3.510 0.000 1.000 1.000
## .Behavior 0.158 0.374 0.423 0.672 0.052 0.052
Your final model is in the next chunk and I’ve listed some of my interpretations below that.
model3 <- '
Beliefs =~ PE + EE + SI + HM
Tech Competency =~ TK + TCK + TPK + TPCK
Behavior =~ BI + HA
Behavior ~ Beliefs
Beliefs ~ Tech Competency
'
fit <- sem(model3, missing = 'ML', data = data)
summary(fit, fit.measures = TRUE, standardized=TRUE)
## lavaan 0.6-11 ended normally after 74 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 32
##
## Used Total
## Number of observations 113 149
## Number of missing patterns 6
##
## Model Test User Model:
##
## Test statistic 32.255
## Degrees of freedom 33
## P-value (Chi-square) 0.504
##
## Model Test Baseline Model:
##
## Test statistic 414.729
## Degrees of freedom 45
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 1.000
## Tucker-Lewis Index (TLI) 1.003
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -1784.846
## Loglikelihood unrestricted model (H1) -1768.718
##
## Akaike (AIC) 3633.691
## Bayesian (BIC) 3720.968
## Sample-size adjusted Bayesian (BIC) 3619.831
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.000
## 90 Percent confidence interval - lower 0.000
## 90 Percent confidence interval - upper 0.067
## P-value RMSEA <= 0.05 0.841
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.048
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Observed
## Observed information based on Hessian
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## Beliefs =~
## PE 1.000 2.029 0.648
## EE 0.724 0.120 6.030 0.000 1.469 0.722
## SI 0.608 0.133 4.573 0.000 1.234 0.525
## HM 0.859 0.142 6.052 0.000 1.742 0.719
## TechCompetency =~
## TK 1.000 1.886 0.707
## TCK 0.147 0.033 4.452 0.000 0.277 0.530
## TPK 0.930 0.136 6.837 0.000 1.755 0.832
## TPCK 1.143 0.157 7.266 0.000 2.155 0.892
## Behavior =~
## BI 1.000 1.757 0.921
## HA 1.021 0.173 5.905 0.000 1.793 0.602
##
## Regressions:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## Behavior ~
## Beliefs 0.824 0.120 6.855 0.000 0.952 0.952
## Beliefs ~
## TechCompetency 0.895 0.172 5.203 0.000 0.833 0.833
##
## Intercepts:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .PE 16.310 0.294 55.414 0.000 16.310 5.213
## .EE 17.895 0.205 87.486 0.000 17.895 8.797
## .SI 12.855 0.238 53.908 0.000 12.855 5.463
## .HM 12.878 0.244 52.722 0.000 12.878 5.316
## .TK 14.949 0.284 52.666 0.000 14.949 5.603
## .TCK 4.319 0.056 76.778 0.000 4.319 8.267
## .TPK 17.402 0.223 78.190 0.000 17.402 8.256
## .TPCK 16.947 0.253 66.856 0.000 16.947 7.012
## .BI 13.202 0.196 67.482 0.000 13.202 6.924
## .HA 9.300 0.302 30.830 0.000 9.300 3.119
## .Beliefs 0.000 0.000 0.000
## TechCompetency 0.000 0.000 0.000
## .Behavior 0.000 0.000 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|) Std.lv Std.all
## .PE 5.673 0.890 6.376 0.000 5.673 0.580
## .EE 1.981 0.349 5.675 0.000 1.981 0.479
## .SI 4.014 0.612 6.563 0.000 4.014 0.725
## .HM 2.833 0.486 5.832 0.000 2.833 0.483
## .TK 3.560 0.624 5.708 0.000 3.560 0.500
## .TCK 0.196 0.032 6.131 0.000 0.196 0.719
## .TPK 1.365 0.301 4.536 0.000 1.365 0.307
## .TPCK 1.196 0.351 3.410 0.001 1.196 0.205
## .BI 0.549 0.342 1.607 0.108 0.549 0.151
## .HA 5.672 0.889 6.377 0.000 5.672 0.638
## .Beliefs 1.263 0.473 2.671 0.008 0.307 0.307
## TechCompetency 3.558 1.011 3.520 0.000 1.000 1.000
## .Behavior 0.290 0.357 0.812 0.417 0.094 0.094
Your can report that your fit statistics are χ2(33) = 32.26, p = .504, CFI = 1, RMSEA = 0. This indicates that the theoretical model (combination of arrows) fit the data you collected perfectly. This allows you to interpret the effects in the model.
When looking at effects, use the Std.all column. The latent variables in the summary represent the confirmatory factor analysis (e.g., PE, EE, SI, and HM all load onto beliefs > .4, so you accept those results and include the Std.all effect size on that part of the model you draw/illustrate).
The regression section in the summary represents the lines between the factors that are remaining the model (e.g., competency has a large effect on beliefs (.83) and beliefs has a large effect on behaviors (.95)) These are interpreted like r values in correlation, so if you square them you get r-squared values that you can use to say that you explained 90% of the variance in behaviors by beliefs (r-squared = .95*.95 = .90) and that you explained 69% of the variance in beliefs with competence (r-squared = .83*.83 = .69)
I’ve included the best visualization I can produce of this model below, but this does not indicate that you tested a relationship between competency and behavior (which is important given your second research questoin). When drawing the model, you can use a dashed line to indicate the nonsignificant path that was pruned between tech competency and behavior like Zhang et al. (2021).
#install.packages('lavaanPlot')
library(lavaanPlot)
lavaanPlot(model = fit, node_options = list(shape = "box", fontname = "Helvetica"), edge_options = list(color = "grey"), coefs = TRUE, stand = TRUE)