Chapter 12 Basic Statistics

NOTE FOR ANDREW Merge content from Module06.Rmd. I added a ton of resources about interactions there.

12.1 Preamble

This could very well be a book on its own. We are not going to go into too much detail here, but I will show a few statistics you can run for exploratory measures. Please understand that these stats should NOT be what is included in your final manuscript. They are simply a quick and dirty way to gain an understanding of your data.

12.2 Running your models

For starters I am a big fan of using something that allows me to run through several models to get an overall feel for the data. This usually involves either broom or rstatix package which allows for pipe friendly modelling. I will commonly do this in order to evaluate if random factors should be included in my final model (covered in the mixed model chapter).

# Question: Is there a subject effect in our data?
# if there is then we introduce it as a random effect.

tbl.SubjectEffect <- df.nirs %>% 
  group_by(metric) %>% 
  do(tidy(lm(value ~ Subject, .))) # requires broom

tbl.SubjectEffect <- df.nirs %>% 
  group_by(metric) %>% 
  rstatix::anova_test(value~Subject)
# answer is yes. There is a subject effect that we must accomodate

For a very complete guide on running ANOVAs in R for your first time. This post is immaculate.

You can see some exercises here

12.3 Modeling your data

To test the normality of your data you can use a few different methods

  1. Plot your data
  2. Check skewness and kurtosis
  3. Shapiro test.

In order to visualize your linear model I use 3 packages/BlandAltmanLeh/vignettes/Intro

  1. jtools
  2. interactions
  3. ggeffects (used less)
  4. To visualize lmer mixed models see here
  5. To follow up a mixed-model here

If you want a very quick way to look at your data, using rstatix is my #1 recommendation.

ggstatsplot is a nice package that gets around a few problems you may run into.

The broom package for visualising models

12.4 Statistics - Resources

12.4.1 Correlations in R/random_slopes

http://www.sthda.com/english/wiki/visualize-correlation-matrix-using-correlogram
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5383908/

https://cran.r-project.org/web/packages/sjPlot/vignettes/sjtitemanalysis.html

12.4.2 Correlation Plots:

    http://www.sthda.com/english/wiki/visualize-correlation-matrix-using-correlogram

12.4.3 General Good Sites

https://statisticsbyjim.com/jim_frost/

12.4.4 Learn about Residuals:

http://docs.statwing.com/interpreting-residual-plots-to-improve-your-regression/
https://statisticsbyjim.com/regression/check-residual-plots-regression-analysis/
https://rpubs.com/iabrady/residual-analysis

http://www.learnbymarketing.com/tutorials/linear-regression-in-r/

12.4.5 Regression Diagnostics

    https://www.statmethods.net/stats/rdiagnostics.html
        
    tidymodels : https://www.tidymodels.org/
        

12.4.6 Bland-Altman Plots

    https://cran.r-project.org/web/packages/BlandAltmanLeh/vignettes/Intro.html

12.4.7 ANOVA

https://www.datanovia.com/en/lessons/anova-in-r/

12.5 Looking at interactions

There are a few packages I like to use to visualize interactions. In no particular order they are:

#Example from my microstates project see scripts/tbl-testing.R
sjPlot::plot_model(model4, type = "pred", terms = c("Age", "Sex"), show.data = TRUE) + 
  theme(legend.position = "top") +
  labs(title = "Predicted values of MeanOccurrence (What the model says)",
       y = "Mean Microstate Occurrence")

I am going to be using the interactions package here but later in the document I show a few other examples. There are 2 main functions

  1. cat_plot for categorical predictors
  2. interact_plot when you have continuous predictors

12.5.1 Packages

mertool is a gui for modelling –> https://www.jaredknowles.com/journal/2015/8/12/announcing-mertools tidymodels: https://www.tidymodels.org/learn/statistics/tidy-analysis/

To assess model performance I use the performance package. You can visualize your model using this vignette

# Running an ANOVA ----------------
# testing to see if the type of service has an effect on the Journey Time 
#model <- lm(journey_time_avg ~ service, data=df)
model <- aov(journey_time_avg ~ service, data=df)
modsum <- summary(model)
#anova(model) #show me the answer in an ANOVA table
modsum

# ANOVA w/ Type III SS --------------
# requires the "car" library
model <- Anova(lm(Measure ~ Group + Brain_Region + Group*Brain_Region, data = df, contrasts=list(Group=contr.sum, Brain_Region=contr.sum), type=3))
report(model) #requires easystats
describe.aov(model, 'Group:Brain_Region', sstype = 2) #requires apastats

options(contrasts = c("contr.sum", "contr.poly"))
Anova(model, type="III")  # Type III tests


# Post-Hoc according to Field's Book ---------
  contrasts(df$Group)=contr.poly(3)
  #model.1=aov(dv~covariate+factorvariable, data=dataname)
  model.1=aov(Measure ~ Baseline + Group + Brain_Region + Group*Brain_Region, data = df)
  Anova(model.1, type="III") 
  # Make sure you use capital "A" Anova here and not anova. This will give results using type III SS.
  posthoc=glht(model.1, linfct=mcp(Group="Tukey"))  ##gives the post-hoc Tukey analysis
  summary(posthoc) ##shows the output in a nice format.
  
  confint(posthoc) # Give me the confidence intervals for the post-hoc tests
  effect("Group", model.1) # Give me the group effects

Below is another loop

# Run Several models using sapply --------------------
# IVs and covariates 
ivs <- c("Neurological_Infection")
covariates <- c("Etiology_of_Injury", "Open_Closed_Injury", "SDH", "EDH", "Intracranial_monitors", "Non_neurologic_surgery", "Skull_fracture", "ICU_LOS_in_days", "Time_to_surgery", "Site_of_surgery", "Craniotomy", "Craniotomy_OR_duration", "Craniectomy", "Craniectomy_OR_duration", "Bone_removal_surgeries", "Initial_Cranioplasty", "Days_to_initial_cranioplasty", "Initial_Cranioplasty_OR_Duration", "Initial_Cranioplasty_bone_material", "Bone_replacement_surgeries", "Duraplasty_material", "Other_Infection", "Non_periop_antibiotics", "Time_of_day_of_initial_surgery", "Week_weekend_of_initial_surgery", "Hydrocephalus", "Post_op_complications", "EtOH_substance_abuse", "Anticoagulation", "Seizure", "HTN", "Smoker", "Diabetes", "Heart_disease", "Liver_disease", "Kidney_disease", "Previous_brain_injury_surgery")

univ_formulas <- sapply(covariates,
                        function(x) as.formula(paste('Neurological_Infection~', x)))

univ_models <- lapply( univ_formulas, function(x){glm(x, ,family=binomial(link='logit'),data=df1)})
# Making a loop to run several stats and saving them as a table --------
# I have an example of this in JumpCut projects
cond = levels(erp$Condition)
comp = levels(erp$ERP_component)
erp$NumUndiagConc <- as.factor(erp$NumUndiagConc)
erp$NumPriorConc <- as.factor(erp$NumPriorConc)

res <- data.frame()
for (yy in (1:(length(cond)))) {
  data <- subset(erp, Condition == cond[yy])
  if ((cond[yy]=="rGoC") | (cond[yy]=="rNgW")){
    comp <- c("ERN_Amp","ERN_Lat","Pe_Amp","Pe_Lat")
  } else if ((cond[yy]!="rGoC") | (cond[yy]!="rNgW")){
    comp = c( "N2_Amp",  "N2_Lat",  "P3a_Amp","P3a_Lat", "P3b_Amp", "P3b_Lat")
  }
  res1 <- data.frame() #reset after each condition loop
  for (zz in (1:(length(comp)))) {
    compData <- subset(data,ERP_component == comp[zz])
    electNum = compData$electNum
    if (length(unique(electNum))==1){
      fm1 <- lmer(ERP_value~AthleteType + Time + AthleteType*Time + (1|id) ,data=compData)
    } else if (length(unique(electNum))>=1){
      fm1 <- lmer(ERP_value~AthleteType + NumPriorConc+ AthleteType*Time +AthleteType*NumPriorConc*Time  + (1|id) + (1|electNum),data=compData)
    }
    assign(paste("mod", cond[yy],gsub("_","",comp[zz]), sep="_"), fm1) #save it in the form of modsum_GoC_N2Amp
    assign(paste("modsum", cond[yy],gsub("_","",comp[zz]), sep="_"), summary(fm1)) #save it in the form of modsum_GoC_N2Amp
    tmod <- as.data.frame(summary(fm1)$coefficients)
    tmod <- round(tmod[-c(1), ],3)
    tmod <- subset(tmod,`Pr(>|t|)`<= 0.1)
    if (nrow(tmod)==0){
      a <- 1
    } else if (nrow(tmod)>=0){
      Test <- rownames(tmod)
      tmod <- cbind(Test,tmod)
      rownames(tmod) <- NULL
      #Component <- c(rep_len(paste(cond[yy],' ',comp[zz]),length.out = length(Test)))#comp[zz]
      Component <- c(rep_len(paste(comp[zz]),length.out = length(Test)))#comp[zz]
      tmod <- cbind(Component,tmod)
      res <- rbind(res,tmod)
      tmod$`Pr(>|t|)`[ tmod$`Pr(>|t|)` <0.001] <- "<0.001"
      tmod$`Pr(>|t|)` <- sub("[0].", ".", tmod$`Pr(>|t|)`) 
      #tmod$`t value` <- round(tmod$`t value`,digits=2)
      tmod$Result <- paste0("t(", round(tmod$df,0), ")=", sprintf("%.2f", tmod$`t value`), ", p=", tmod$`Pr(>|t|)`)
      tmod <- subset(tmod,select=c("Component", "Test","Result"))
      tmod$Test <- gsub("AthleteType.L","Athlete Type",tmod$Test)
      tmod$Test <- gsub(":","$\\times$", tmod$Test, fixed=TRUE)
      tmod$Component <- gsub("_"," ",tmod$Component)
      tmod$Test <- gsub("Time","Time ",tmod$Test)
      res1 <- rbind(res1,tmod)
    }
    assign(paste0(cond[yy],"statsTable"), res1) #save it in the form of modsum_GoC_N2Amp
  }
}

12.6 Setting contrasts

  1. Contrast in R

12.7 Supplementary Resources

  1. An overview of correlation and coufoundings