Chapter 12 Basic Statistics
NOTE FOR ANDREW Merge content from Module06.Rmd. I added a ton of resources about interactions there.
12.1 Preamble
This could very well be a book on its own. We are not going to go into too much detail here, but I will show a few statistics you can run for exploratory measures. Please understand that these stats should NOT be what is included in your final manuscript. They are simply a quick and dirty way to gain an understanding of your data.
12.2 Running your models
For starters I am a big fan of using something that allows me to run through several models to get an overall feel for the data. This usually involves either broom or rstatix package which allows for pipe friendly modelling. I will commonly do this in order to evaluate if random factors should be included in my final model (covered in the mixed model chapter).
# Question: Is there a subject effect in our data?
# if there is then we introduce it as a random effect.
tbl.SubjectEffect <- df.nirs %>% 
  group_by(metric) %>% 
  do(tidy(lm(value ~ Subject, .))) # requires broom
tbl.SubjectEffect <- df.nirs %>% 
  group_by(metric) %>% 
  rstatix::anova_test(value~Subject)
# answer is yes. There is a subject effect that we must accomodateFor a very complete guide on running ANOVAs in R for your first time. This post is immaculate.
You can see some exercises here
12.3 Modeling your data
To test the normality of your data you can use a few different methods
- http://www.sthda.com/english/wiki/normality-test-in-r
 - https://rstudio-pubs-static.s3.amazonaws.com/2002_1f803b2bc84c46008d3599a07867a95a.html
 - https://www.datanovia.com/en/lessons/anova-in-r/
 
- Plot your data
 - Check skewness and kurtosis
 - Shapiro test.
 
In order to visualize your linear model I use 3 packages/BlandAltmanLeh/vignettes/Intro
- jtools
 - interactions
 - ggeffects (used less)
 - To visualize 
lmermixed models see here - To follow up a mixed-model here
 
If you want a very quick way to look at your data, using rstatix is my #1 recommendation.
ggstatsplot is a nice package that gets around a few problems you may run into.
The broom package for visualising models
12.4 Statistics - Resources
12.4.1 Correlations in R/random_slopes
http://www.sthda.com/english/wiki/visualize-correlation-matrix-using-correlogram
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5383908/
https://cran.r-project.org/web/packages/sjPlot/vignettes/sjtitemanalysis.html
12.4.2 Correlation Plots:
    http://www.sthda.com/english/wiki/visualize-correlation-matrix-using-correlogram
12.4.4 Learn about Residuals:
http://docs.statwing.com/interpreting-residual-plots-to-improve-your-regression/
https://statisticsbyjim.com/regression/check-residual-plots-regression-analysis/
https://rpubs.com/iabrady/residual-analysis
http://www.learnbymarketing.com/tutorials/linear-regression-in-r/
12.4.5 Regression Diagnostics
    https://www.statmethods.net/stats/rdiagnostics.html
        
    tidymodels : https://www.tidymodels.org/
        
12.4.6 Bland-Altman Plots
    https://cran.r-project.org/web/packages/BlandAltmanLeh/vignettes/Intro.html
12.4.8 Mixed Models
Good introduction: https://ourcodingclub.github.io/tutorials/mixed-models/ More intro: https://www.jaredknowles.com/journal/2013/11/25/getting-started-with-mixed-effect-models-in-r https://dynamicecology.wordpress.com/2015/02/05/how-many-terms-in-your-model-before-statistical-machismo/ https://m-clark.github.io/mixed-models-with-R/random_slopes.html Package that gives pseudo-R^2 for mixed models –> https://cran.r-project.org/web/packages/jtools/vignettes/summ.html#report_robust_standard_errors https://rinterested.github.io/statistics/mixed_effects_comparison.html
12.4.9 Cluster Analysis (see bottom of the page for more links)
https://www.datanovia.com/en/product/practical-guide-to-cluster-analysis-in-r/
12.5 Looking at interactions
There are a few packages I like to use to visualize interactions. In no particular order they are:
- jtools
 - interactions
 - sjPlot
 - ggeffects
 - modelbased (vignette here)
 - ggstatsplot
 
#Example from my microstates project see scripts/tbl-testing.R
sjPlot::plot_model(model4, type = "pred", terms = c("Age", "Sex"), show.data = TRUE) + 
  theme(legend.position = "top") +
  labs(title = "Predicted values of MeanOccurrence (What the model says)",
       y = "Mean Microstate Occurrence")I am going to be using the interactions package here but later in the document I show a few other examples. There are 2 main functions
cat_plotfor categorical predictorsinteract_plotwhen you have continuous predictors
12.5.1 Packages
mertool is a gui for modelling –> https://www.jaredknowles.com/journal/2015/8/12/announcing-mertools tidymodels: https://www.tidymodels.org/learn/statistics/tidy-analysis/
To assess model performance I use the performance package. You can visualize your model using this vignette
# Running an ANOVA ----------------
# testing to see if the type of service has an effect on the Journey Time 
#model <- lm(journey_time_avg ~ service, data=df)
model <- aov(journey_time_avg ~ service, data=df)
modsum <- summary(model)
#anova(model) #show me the answer in an ANOVA table
modsum
# ANOVA w/ Type III SS --------------
# requires the "car" library
model <- Anova(lm(Measure ~ Group + Brain_Region + Group*Brain_Region, data = df, contrasts=list(Group=contr.sum, Brain_Region=contr.sum), type=3))
report(model) #requires easystats
describe.aov(model, 'Group:Brain_Region', sstype = 2) #requires apastats
options(contrasts = c("contr.sum", "contr.poly"))
Anova(model, type="III")  # Type III tests
# Post-Hoc according to Field's Book ---------
  contrasts(df$Group)=contr.poly(3)
  #model.1=aov(dv~covariate+factorvariable, data=dataname)
  model.1=aov(Measure ~ Baseline + Group + Brain_Region + Group*Brain_Region, data = df)
  Anova(model.1, type="III") 
  # Make sure you use capital "A" Anova here and not anova. This will give results using type III SS.
  posthoc=glht(model.1, linfct=mcp(Group="Tukey"))  ##gives the post-hoc Tukey analysis
  summary(posthoc) ##shows the output in a nice format.
  
  confint(posthoc) # Give me the confidence intervals for the post-hoc tests
  effect("Group", model.1) # Give me the group effectsBelow is another loop
# Run Several models using sapply --------------------
# IVs and covariates 
ivs <- c("Neurological_Infection")
covariates <- c("Etiology_of_Injury", "Open_Closed_Injury", "SDH", "EDH", "Intracranial_monitors", "Non_neurologic_surgery", "Skull_fracture", "ICU_LOS_in_days", "Time_to_surgery", "Site_of_surgery", "Craniotomy", "Craniotomy_OR_duration", "Craniectomy", "Craniectomy_OR_duration", "Bone_removal_surgeries", "Initial_Cranioplasty", "Days_to_initial_cranioplasty", "Initial_Cranioplasty_OR_Duration", "Initial_Cranioplasty_bone_material", "Bone_replacement_surgeries", "Duraplasty_material", "Other_Infection", "Non_periop_antibiotics", "Time_of_day_of_initial_surgery", "Week_weekend_of_initial_surgery", "Hydrocephalus", "Post_op_complications", "EtOH_substance_abuse", "Anticoagulation", "Seizure", "HTN", "Smoker", "Diabetes", "Heart_disease", "Liver_disease", "Kidney_disease", "Previous_brain_injury_surgery")
univ_formulas <- sapply(covariates,
                        function(x) as.formula(paste('Neurological_Infection~', x)))
univ_models <- lapply( univ_formulas, function(x){glm(x, ,family=binomial(link='logit'),data=df1)})
# Making a loop to run several stats and saving them as a table --------
# I have an example of this in JumpCut projects
cond = levels(erp$Condition)
comp = levels(erp$ERP_component)
erp$NumUndiagConc <- as.factor(erp$NumUndiagConc)
erp$NumPriorConc <- as.factor(erp$NumPriorConc)
res <- data.frame()
for (yy in (1:(length(cond)))) {
  data <- subset(erp, Condition == cond[yy])
  if ((cond[yy]=="rGoC") | (cond[yy]=="rNgW")){
    comp <- c("ERN_Amp","ERN_Lat","Pe_Amp","Pe_Lat")
  } else if ((cond[yy]!="rGoC") | (cond[yy]!="rNgW")){
    comp = c( "N2_Amp",  "N2_Lat",  "P3a_Amp","P3a_Lat", "P3b_Amp", "P3b_Lat")
  }
  res1 <- data.frame() #reset after each condition loop
  for (zz in (1:(length(comp)))) {
    compData <- subset(data,ERP_component == comp[zz])
    electNum = compData$electNum
    if (length(unique(electNum))==1){
      fm1 <- lmer(ERP_value~AthleteType + Time + AthleteType*Time + (1|id) ,data=compData)
    } else if (length(unique(electNum))>=1){
      fm1 <- lmer(ERP_value~AthleteType + NumPriorConc+ AthleteType*Time +AthleteType*NumPriorConc*Time  + (1|id) + (1|electNum),data=compData)
    }
    assign(paste("mod", cond[yy],gsub("_","",comp[zz]), sep="_"), fm1) #save it in the form of modsum_GoC_N2Amp
    assign(paste("modsum", cond[yy],gsub("_","",comp[zz]), sep="_"), summary(fm1)) #save it in the form of modsum_GoC_N2Amp
    tmod <- as.data.frame(summary(fm1)$coefficients)
    tmod <- round(tmod[-c(1), ],3)
    tmod <- subset(tmod,`Pr(>|t|)`<= 0.1)
    if (nrow(tmod)==0){
      a <- 1
    } else if (nrow(tmod)>=0){
      Test <- rownames(tmod)
      tmod <- cbind(Test,tmod)
      rownames(tmod) <- NULL
      #Component <- c(rep_len(paste(cond[yy],' ',comp[zz]),length.out = length(Test)))#comp[zz]
      Component <- c(rep_len(paste(comp[zz]),length.out = length(Test)))#comp[zz]
      tmod <- cbind(Component,tmod)
      res <- rbind(res,tmod)
      tmod$`Pr(>|t|)`[ tmod$`Pr(>|t|)` <0.001] <- "<0.001"
      tmod$`Pr(>|t|)` <- sub("[0].", ".", tmod$`Pr(>|t|)`) 
      #tmod$`t value` <- round(tmod$`t value`,digits=2)
      tmod$Result <- paste0("t(", round(tmod$df,0), ")=", sprintf("%.2f", tmod$`t value`), ", p=", tmod$`Pr(>|t|)`)
      tmod <- subset(tmod,select=c("Component", "Test","Result"))
      tmod$Test <- gsub("AthleteType.L","Athlete Type",tmod$Test)
      tmod$Test <- gsub(":","$\\times$", tmod$Test, fixed=TRUE)
      tmod$Component <- gsub("_"," ",tmod$Component)
      tmod$Test <- gsub("Time","Time ",tmod$Test)
      res1 <- rbind(res1,tmod)
    }
    assign(paste0(cond[yy],"statsTable"), res1) #save it in the form of modsum_GoC_N2Amp
  }
}