Lu Analysis

Hi Pei-Hsuan!

I’m including some notes in this document so you can keep track of what is happening in it. It might be useful to try to reproduce this in a separate file, but mostly I want you to be able to explain your analysis and answer questions about it. Look for the text beside the lines of code that explains the steps (some explanations will be in paragraph form like this).

library(tidyverse) #call in the library for loading data

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6      ✔ purrr   0.3.5 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.4.0 
## ✔ readr   2.1.2      ✔ forcats 0.5.1 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

data <- read_csv('youtube.csv') #load your data

## Rows: 149 Columns: 45
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (43): PE_1, PE_2, PE_3, PE_4, EE_1, EE_2, EE_3, EE_4, SI_1, SI_2, SI_3, ...
## lgl  (2): Q13_4_TEXT, Q17_4_TEXT
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

#data <- data %>% drop_na(BI_1) #we can use this line to drop the missing data cases before analysis if we need to later on, right now it's not important to do this.

Reliability

Each line of code below is creating a new data frame or spreadsheet that only includes data from that particular subscale. In the second chunk below, reliability estimates (Cronbach’s Alpha) are calculated. Everything has acceptable reliability except for TK and SI. Neither subscale improves in reliability if items are removed. However, these are approaching acceptable reliability and therefore you should continue to proceed with the analysis. Facilitation was less reliable than others, but this may not have a place in the model given the good fit and replication you’ve achieved.

TKRel <- data.frame(data[c(grepl("TK", colnames(data)))])
TCKRel <- data.frame(data[c(grepl("TCK", colnames(data)))])
TPKRel <- data.frame(data[c(grepl("TPK", colnames(data)))])
TPCKRel <- data.frame(data[c(grepl("TPCK", colnames(data)))])
PERel <- data.frame(data[c(grepl("PE", colnames(data)))])
EERel <- data.frame(data[c(grepl("EE", colnames(data)))])
SIRel <- data.frame(data[c(grepl("SI", colnames(data)))])
HMRel <- data.frame(data[c(grepl("HM", colnames(data)))])
BIRel <- data.frame(data[c(grepl("BI", colnames(data)))])
HARel <- data.frame(data[c(grepl("Ha", colnames(data)))])
FacilitationRel <- data.frame(data[c(grepl("FC", colnames(data)))])

library(psych)

psych::alpha(TKRel) #.67 
#psych::alpha(TCKRel) #ONE ITEM NO REL
psych::alpha(TPKRel) #.77 
psych::alpha(TPCKRel) #.79
psych::alpha(PERel) # .82
psych::alpha(EERel) # .77
psych::alpha(SIRel) # .66
psych::alpha(HMRel) # .96
psych::alpha(BIRel) # .87
psych::alpha(HARel) # .79
psych::alpha(FacilitationRel) # .62

#other outcomes are not listed as they have only one item each

The code below is creating composite variables for each subpart within the three bigger constructs in the model. The total score on these items becomes continuous given the addition of oridnal Likert-type data, and then you will use those continuous composite (total score) variables in the model.

data$TK <- data$TK_1 + data$TK_2 + data$TK_3 + data$TK_4
data$TCK <- data$TCK_1
data$TPK <- data$TPK_1 + data$TPK_2 + data$TPK_3 + data$TPK_4
data$TPCK <- data$TPCK_1 + data$TPCK_2 + data$TPCK_3 + data$TPCK_4
data$PE <- data$PE_1 + data$PE_2 + data$PE_3 + data$PE_4
data$EE <- data$EE_1 + data$EE_2 + data$EE_3 + data$EE_4
data$SI <- data$SI_1 + data$SI_2 + data$SI_3 + data$SI_4
data$HM <- data$HM_1 + data$HM_2 + data$HM_3
data$BI <- data$BI_1 + data$BI_2 + data$BI_3
data$HA <- data$Ha_1 + data$Ha_2 + data$Ha_3
data$Facilitation <- data$FC_1 + data$FC_2 + data$FC_3 + data$FC_4

SEM Construction

Below is the first model that includes all hypothesized paths. The model fit is good (non significant chi-square, RMSEA less than .10, and CFI more than .9). However, in the regression section of the summary, you can see that Tech Competency is nonsignificant (p > .05). As such we need to “prune” this path from the model by deleting it from the model syntax which you will see in the second chunk below.

library(lavaan)

## This is lavaan 0.6-11
## lavaan is FREE software! Please report any bugs.

library(semPlot)

model1 <- '

    Beliefs =~ PE + EE + SI + HM
    Tech Competency =~ TK + TCK + TPK + TPCK
    Behavior =~ BI + HA
    
    Behavior ~ Beliefs + Tech Competency
    Beliefs ~ Tech Competency
'

fit <- sem(model1, missing = 'ML', data = data)
summary(fit, fit.measures = TRUE, standardized=TRUE)

## lavaan 0.6-11 ended normally after 78 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        33
##                                                       
##                                                   Used       Total
##   Number of observations                           113         149
##   Number of missing patterns                         6            
##                                                                   
## Model Test User Model:
##                                                       
##   Test statistic                                31.604
##   Degrees of freedom                                32
##   P-value (Chi-square)                           0.487
## 
## Model Test Baseline Model:
## 
##   Test statistic                               414.729
##   Degrees of freedom                                45
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    1.000
##   Tucker-Lewis Index (TLI)                       1.002
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -1784.520
##   Loglikelihood unrestricted model (H1)      -1768.718
##                                                       
##   Akaike (AIC)                                3635.040
##   Bayesian (BIC)                              3725.044
##   Sample-size adjusted Bayesian (BIC)         3620.747
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.000
##   90 Percent confidence interval - lower         0.000
##   90 Percent confidence interval - upper         0.068
##   P-value RMSEA <= 0.05                          0.827
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.048
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Observed
##   Observed information based on                Hessian
## 
## Latent Variables:
##                     Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   Beliefs =~                                                             
##     PE                 1.000                               1.993    0.637
##     EE                 0.728    0.122    5.975    0.000    1.452    0.714
##     SI                 0.616    0.135    4.573    0.000    1.229    0.522
##     HM                 0.863    0.144    6.000    0.000    1.721    0.711
##   TechCompetency =~                                                      
##     TK                 1.000                               1.882    0.705
##     TCK                0.148    0.033    4.454    0.000    0.278    0.531
##     TPK                0.934    0.137    6.818    0.000    1.759    0.834
##     TPCK               1.144    0.158    7.245    0.000    2.153    0.891
##   Behavior =~                                                            
##     BI                 1.000                               1.749    0.917
##     HA                 1.030    0.173    5.943    0.000    1.801    0.604
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   Behavior ~                                                            
##     Beliefs           1.028    0.335    3.065    0.002    1.171    1.171
##     TechCompetency   -0.211    0.306   -0.687    0.492   -0.226   -0.226
##   Beliefs ~                                                             
##     TechCompetency    0.911    0.174    5.224    0.000    0.860    0.860
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .PE               16.310    0.294   55.414    0.000   16.310    5.213
##    .EE               17.897    0.205   87.408    0.000   17.897    8.798
##    .SI               12.856    0.239   53.894    0.000   12.856    5.463
##    .HM               12.880    0.244   52.689    0.000   12.880    5.318
##    .TK               14.950    0.284   52.711    0.000   14.950    5.603
##    .TCK               4.319    0.056   76.821    0.000    4.319    8.266
##    .TPK              17.402    0.222   78.311    0.000   17.402    8.254
##    .TPCK             16.947    0.253   66.954    0.000   16.947    7.011
##    .BI               13.202    0.195   67.531    0.000   13.202    6.922
##    .HA                9.299    0.302   30.834    0.000    9.299    3.119
##    .Beliefs           0.000                               0.000    0.000
##     TechCompetency    0.000                               0.000    0.000
##    .Behavior          0.000                               0.000    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .PE                5.815    0.912    6.379    0.000    5.815    0.594
##    .EE                2.030    0.353    5.746    0.000    2.030    0.491
##    .SI                4.027    0.608    6.626    0.000    4.027    0.727
##    .HM                2.904    0.492    5.908    0.000    2.904    0.495
##    .TK                3.578    0.626    5.713    0.000    3.578    0.503
##    .TCK               0.196    0.032    6.129    0.000    0.196    0.718
##    .TPK               1.352    0.300    4.500    0.000    1.352    0.304
##    .TPCK              1.206    0.352    3.429    0.001    1.206    0.206
##    .BI                0.577    0.339    1.700    0.089    0.577    0.159
##    .HA                5.643    0.885    6.373    0.000    5.643    0.635
##    .Beliefs           1.035    0.505    2.048    0.041    0.260    0.260
##     TechCompetency    3.541    1.009    3.510    0.000    1.000    1.000
##    .Behavior          0.103    0.443    0.232    0.817    0.034    0.034

Model 2 does not have tech competency predicting behavior given the nonsignficant regression path that I mentioned above. I was, however, concerned with one of the few assumptions of SEM being “no multicollinearity” which essentially means that factors (like beliefs and behavior) should not be correlated above r = .8 so that you can be sure you are actually measuring two different things (and not the same thing). To assess this, I added a correlation between the factors to the model (Beliefs ~~ Behaviors). In the covariance section of the summary, this was nonsignificant, so you can move forward knowing that your model has met all assumptions, including the no multicollinearity assumption. I updated your final model to not include this correlation as it was nonsignificant.

model2 <- '

    Beliefs =~ PE + EE + SI + HM
    Tech Competency =~ TK + TCK + TPK + TPCK
    Behavior =~ BI + HA
    
    Behavior ~ Beliefs
    Beliefs ~ Tech Competency
    
    Beliefs ~~ Behavior
'

fit <- sem(model2, missing = 'ML', data = data)
summary(fit, fit.measures = TRUE, standardized=TRUE)

## lavaan 0.6-11 ended normally after 75 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        33
##                                                       
##                                                   Used       Total
##   Number of observations                           113         149
##   Number of missing patterns                         6            
##                                                                   
## Model Test User Model:
##                                                       
##   Test statistic                                31.604
##   Degrees of freedom                                32
##   P-value (Chi-square)                           0.487
## 
## Model Test Baseline Model:
## 
##   Test statistic                               414.729
##   Degrees of freedom                                45
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    1.000
##   Tucker-Lewis Index (TLI)                       1.002
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -1784.520
##   Loglikelihood unrestricted model (H1)      -1768.718
##                                                       
##   Akaike (AIC)                                3635.040
##   Bayesian (BIC)                              3725.044
##   Sample-size adjusted Bayesian (BIC)         3620.747
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.000
##   90 Percent confidence interval - lower         0.000
##   90 Percent confidence interval - upper         0.068
##   P-value RMSEA <= 0.05                          0.827
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.048
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Observed
##   Observed information based on                Hessian
## 
## Latent Variables:
##                     Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   Beliefs =~                                                             
##     PE                 1.000                               1.993    0.637
##     EE                 0.728    0.122    5.975    0.000    1.452    0.714
##     SI                 0.616    0.135    4.573    0.000    1.229    0.522
##     HM                 0.863    0.144    6.000    0.000    1.721    0.711
##   TechCompetency =~                                                      
##     TK                 1.000                               1.882    0.705
##     TCK                0.148    0.033    4.454    0.000    0.278    0.531
##     TPK                0.934    0.137    6.818    0.000    1.759    0.834
##     TPCK               1.144    0.158    7.245    0.000    2.153    0.891
##   Behavior =~                                                            
##     BI                 1.000                               1.749    0.917
##     HA                 1.030    0.173    5.943    0.000    1.801    0.604
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   Behavior ~                                                            
##     Beliefs           0.797    0.122    6.542    0.000    0.908    0.908
##   Beliefs ~                                                             
##     TechCompetency    0.911    0.174    5.224    0.000    0.860    0.860
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##  .Beliefs ~~                                                            
##    .Behavior          0.239    0.276    0.867    0.386    0.592    0.592
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .PE               16.310    0.294   55.414    0.000   16.310    5.213
##    .EE               17.897    0.205   87.408    0.000   17.897    8.798
##    .SI               12.856    0.239   53.894    0.000   12.856    5.463
##    .HM               12.880    0.244   52.689    0.000   12.880    5.318
##    .TK               14.950    0.284   52.711    0.000   14.950    5.603
##    .TCK               4.319    0.056   76.821    0.000    4.319    8.266
##    .TPK              17.402    0.222   78.311    0.000   17.402    8.254
##    .TPCK             16.947    0.253   66.954    0.000   16.947    7.011
##    .BI               13.202    0.195   67.531    0.000   13.202    6.922
##    .HA                9.299    0.302   30.834    0.000    9.299    3.119
##    .Beliefs           0.000                               0.000    0.000
##     TechCompetency    0.000                               0.000    0.000
##    .Behavior          0.000                               0.000    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .PE                5.815    0.912    6.379    0.000    5.815    0.594
##    .EE                2.030    0.353    5.746    0.000    2.030    0.491
##    .SI                4.027    0.608    6.626    0.000    4.027    0.727
##    .HM                2.904    0.492    5.908    0.000    2.904    0.495
##    .TK                3.578    0.626    5.713    0.000    3.578    0.503
##    .TCK               0.196    0.032    6.129    0.000    0.196    0.718
##    .TPK               1.352    0.300    4.500    0.000    1.352    0.304
##    .TPCK              1.206    0.352    3.429    0.001    1.206    0.206
##    .BI                0.577    0.339    1.700    0.089    0.577    0.159
##    .HA                5.643    0.885    6.373    0.000    5.643    0.635
##    .Beliefs           1.035    0.505    2.048    0.041    0.260    0.260
##     TechCompetency    3.541    1.009    3.510    0.000    1.000    1.000
##    .Behavior          0.158    0.374    0.423    0.672    0.052    0.052

Your final model is in the next chunk and I’ve listed some of my interpretations below that.

model3 <- '

    Beliefs =~ PE + EE + SI + HM
    Tech Competency =~ TK + TCK + TPK + TPCK
    Behavior =~ BI + HA
    
    Behavior ~ Beliefs
    Beliefs ~ Tech Competency
'

fit <- sem(model3, missing = 'ML', data = data)
summary(fit, fit.measures = TRUE, standardized=TRUE)

## lavaan 0.6-11 ended normally after 74 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        32
##                                                       
##                                                   Used       Total
##   Number of observations                           113         149
##   Number of missing patterns                         6            
##                                                                   
## Model Test User Model:
##                                                       
##   Test statistic                                32.255
##   Degrees of freedom                                33
##   P-value (Chi-square)                           0.504
## 
## Model Test Baseline Model:
## 
##   Test statistic                               414.729
##   Degrees of freedom                                45
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    1.000
##   Tucker-Lewis Index (TLI)                       1.003
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -1784.846
##   Loglikelihood unrestricted model (H1)      -1768.718
##                                                       
##   Akaike (AIC)                                3633.691
##   Bayesian (BIC)                              3720.968
##   Sample-size adjusted Bayesian (BIC)         3619.831
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.000
##   90 Percent confidence interval - lower         0.000
##   90 Percent confidence interval - upper         0.067
##   P-value RMSEA <= 0.05                          0.841
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.048
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Observed
##   Observed information based on                Hessian
## 
## Latent Variables:
##                     Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   Beliefs =~                                                             
##     PE                 1.000                               2.029    0.648
##     EE                 0.724    0.120    6.030    0.000    1.469    0.722
##     SI                 0.608    0.133    4.573    0.000    1.234    0.525
##     HM                 0.859    0.142    6.052    0.000    1.742    0.719
##   TechCompetency =~                                                      
##     TK                 1.000                               1.886    0.707
##     TCK                0.147    0.033    4.452    0.000    0.277    0.530
##     TPK                0.930    0.136    6.837    0.000    1.755    0.832
##     TPCK               1.143    0.157    7.266    0.000    2.155    0.892
##   Behavior =~                                                            
##     BI                 1.000                               1.757    0.921
##     HA                 1.021    0.173    5.905    0.000    1.793    0.602
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   Behavior ~                                                            
##     Beliefs           0.824    0.120    6.855    0.000    0.952    0.952
##   Beliefs ~                                                             
##     TechCompetency    0.895    0.172    5.203    0.000    0.833    0.833
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .PE               16.310    0.294   55.414    0.000   16.310    5.213
##    .EE               17.895    0.205   87.486    0.000   17.895    8.797
##    .SI               12.855    0.238   53.908    0.000   12.855    5.463
##    .HM               12.878    0.244   52.722    0.000   12.878    5.316
##    .TK               14.949    0.284   52.666    0.000   14.949    5.603
##    .TCK               4.319    0.056   76.778    0.000    4.319    8.267
##    .TPK              17.402    0.223   78.190    0.000   17.402    8.256
##    .TPCK             16.947    0.253   66.856    0.000   16.947    7.012
##    .BI               13.202    0.196   67.482    0.000   13.202    6.924
##    .HA                9.300    0.302   30.830    0.000    9.300    3.119
##    .Beliefs           0.000                               0.000    0.000
##     TechCompetency    0.000                               0.000    0.000
##    .Behavior          0.000                               0.000    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .PE                5.673    0.890    6.376    0.000    5.673    0.580
##    .EE                1.981    0.349    5.675    0.000    1.981    0.479
##    .SI                4.014    0.612    6.563    0.000    4.014    0.725
##    .HM                2.833    0.486    5.832    0.000    2.833    0.483
##    .TK                3.560    0.624    5.708    0.000    3.560    0.500
##    .TCK               0.196    0.032    6.131    0.000    0.196    0.719
##    .TPK               1.365    0.301    4.536    0.000    1.365    0.307
##    .TPCK              1.196    0.351    3.410    0.001    1.196    0.205
##    .BI                0.549    0.342    1.607    0.108    0.549    0.151
##    .HA                5.672    0.889    6.377    0.000    5.672    0.638
##    .Beliefs           1.263    0.473    2.671    0.008    0.307    0.307
##     TechCompetency    3.558    1.011    3.520    0.000    1.000    1.000
##    .Behavior          0.290    0.357    0.812    0.417    0.094    0.094

Your can report that your fit statistics are χ2(33) = 32.26, p = .504, CFI = 1, RMSEA = 0. This indicates that the theoretical model (combination of arrows) fit the data you collected perfectly. This allows you to interpret the effects in the model.

When looking at effects, use the Std.all column. The latent variables in the summary represent the confirmatory factor analysis (e.g., PE, EE, SI, and HM all load onto beliefs > .4, so you accept those results and include the Std.all effect size on that part of the model you draw/illustrate).

The regression section in the summary represents the lines between the factors that are remaining the model (e.g., competency has a large effect on beliefs (.83) and beliefs has a large effect on behaviors (.95)) These are interpreted like r values in correlation, so if you square them you get r-squared values that you can use to say that you explained 90% of the variance in behaviors by beliefs (r-squared = .95*.95 = .90) and that you explained 69% of the variance in beliefs with competence (r-squared = .83*.83 = .69)

Visualization

I’ve included the best visualization I can produce of this model below, but this does not indicate that you tested a relationship between competency and behavior (which is important given your second research questoin). When drawing the model, you can use a dashed line to indicate the nonsignificant path that was pruned between tech competency and behavior like Zhang et al. (2021).

#install.packages('lavaanPlot') 
library(lavaanPlot)
lavaanPlot(model = fit, node_options = list(shape = "box", fontname = "Helvetica"), edge_options = list(color = "grey"), coefs = TRUE, stand = TRUE)