# Chapter 13 Basic Statistics

NOTE FOR ANDREW Merge content from Module06.Rmd. I added a ton of resources about interactions there.

## 13.1 Preamble

This could very well be a book on its own. We are not going to go into too much detail here, but I will show a few statistics you can run for exploratory measures. Please understand that these stats should NOT be what is included in your final manuscript. They are simply a quick and dirty way to gain an understanding of your data.

For starters I am a big fan of using something that allows me to run through several models to get an overall feel for the data. This usually involves either broom or rstatix package which allows for pipe friendly modelling. I will commonly do this in order to evaluate if random factors should be included in my final model (covered in the mixed model chapter).

``````# Question: Is there a subject effect in our data?
# if there is then we introduce it as a random effect.

tbl.SubjectEffect <- df.nirs %>%
group_by(metric) %>%
do(tidy(lm(value ~ Subject, .))) # requires broom

tbl.SubjectEffect <- df.nirs %>%
group_by(metric) %>%
rstatix::anova_test(value~Subject)
# answer is yes. There is a subject effect that we must accomodate``````

For a very complete guide on running ANOVAs in R for your first time. This post is immaculate.

You can see some exercises here

To test the normality of your data you can use a few different methods

2. Check skewness and kurtosis
3. Shapiro test.

In order to visualize your linear model I use 3 packages/BlandAltmanLeh/vignettes/Intro

1. jtools
2. interactions
3. ggeffects (used less)
4. To visualize `lmer` mixed models see here
5. To follow up a mixed-model here

If you want a very quick way to look at your data, using rstatix is my #1 recommendation.

ggstatsplot is a nice package that gets around a few problems you may run into.

The `broom` package for visualising models

## 13.4 Statistics - Resources

### 13.4.1 Correlations in R/random_slopes

``````http://www.sthda.com/english/wiki/visualize-correlation-matrix-using-correlogram
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5383908/``````

https://cran.r-project.org/web/packages/sjPlot/vignettes/sjtitemanalysis.html

### 13.4.2 Correlation Plots:

``    http://www.sthda.com/english/wiki/visualize-correlation-matrix-using-correlogram``

### 13.4.3 General Good Sites

``https://statisticsbyjim.com/jim_frost/``

``````http://docs.statwing.com/interpreting-residual-plots-to-improve-your-regression/
https://statisticsbyjim.com/regression/check-residual-plots-regression-analysis/

http://www.learnbymarketing.com/tutorials/linear-regression-in-r/``````

### 13.4.5 Regression Diagnostics

``````    https://www.statmethods.net/stats/rdiagnostics.html

tidymodels : https://www.tidymodels.org/
``````

### 13.4.6 Bland-Altman Plots

``    https://cran.r-project.org/web/packages/BlandAltmanLeh/vignettes/Intro.html``

### 13.4.7 ANOVA

``https://www.datanovia.com/en/lessons/anova-in-r/``

## 13.5 Looking at interactions

There are a few packages I like to use to visualize interactions. In no particular order they are:

I am going to be using the `interactions` package here but later in the document I show a few other examples. There are 2 main functions

1. `cat_plot` for categorical predictors
2. `interact_plot` when you have continuous predictors

### 13.5.1 Packages

mertool is a gui for modelling –> https://www.jaredknowles.com/journal/2015/8/12/announcing-mertools tidymodels: https://www.tidymodels.org/learn/statistics/tidy-analysis/

To assess model performance I use the performance package. You can visualize your model using this vignette

``````# Running an ANOVA ----------------
# testing to see if the type of service has an effect on the Journey Time
#model <- lm(journey_time_avg ~ service, data=df)
model <- aov(journey_time_avg ~ service, data=df)
modsum <- summary(model)
#anova(model) #show me the answer in an ANOVA table
modsum

# ANOVA w/ Type III SS --------------
# requires the "car" library
model <- Anova(lm(Measure ~ Group + Brain_Region + Group*Brain_Region, data = df, contrasts=list(Group=contr.sum, Brain_Region=contr.sum), type=3))
report(model) #requires easystats
describe.aov(model, 'Group:Brain_Region', sstype = 2) #requires apastats

options(contrasts = c("contr.sum", "contr.poly"))
Anova(model, type="III")  # Type III tests

# Post-Hoc according to Field's Book ---------
contrasts(df\$Group)=contr.poly(3)
#model.1=aov(dv~covariate+factorvariable, data=dataname)
model.1=aov(Measure ~ Baseline + Group + Brain_Region + Group*Brain_Region, data = df)
Anova(model.1, type="III")
# Make sure you use capital "A" Anova here and not anova. This will give results using type III SS.
posthoc=glht(model.1, linfct=mcp(Group="Tukey"))  ##gives the post-hoc Tukey analysis
summary(posthoc) ##shows the output in a nice format.

confint(posthoc) # Give me the confidence intervals for the post-hoc tests
effect("Group", model.1) # Give me the group effects``````

Below is another loop

``````# Run Several models using sapply --------------------
# IVs and covariates
ivs <- c("Neurological_Infection")
covariates <- c("Etiology_of_Injury", "Open_Closed_Injury", "SDH", "EDH", "Intracranial_monitors", "Non_neurologic_surgery", "Skull_fracture", "ICU_LOS_in_days", "Time_to_surgery", "Site_of_surgery", "Craniotomy", "Craniotomy_OR_duration", "Craniectomy", "Craniectomy_OR_duration", "Bone_removal_surgeries", "Initial_Cranioplasty", "Days_to_initial_cranioplasty", "Initial_Cranioplasty_OR_Duration", "Initial_Cranioplasty_bone_material", "Bone_replacement_surgeries", "Duraplasty_material", "Other_Infection", "Non_periop_antibiotics", "Time_of_day_of_initial_surgery", "Week_weekend_of_initial_surgery", "Hydrocephalus", "Post_op_complications", "EtOH_substance_abuse", "Anticoagulation", "Seizure", "HTN", "Smoker", "Diabetes", "Heart_disease", "Liver_disease", "Kidney_disease", "Previous_brain_injury_surgery")

univ_formulas <- sapply(covariates,
function(x) as.formula(paste('Neurological_Infection~', x)))

univ_models <- lapply( univ_formulas, function(x){glm(x, ,family=binomial(link='logit'),data=df1)})
# Making a loop to run several stats and saving them as a table --------
# I have an example of this in JumpCut projects
cond = levels(erp\$Condition)
comp = levels(erp\$ERP_component)
erp\$NumUndiagConc <- as.factor(erp\$NumUndiagConc)
erp\$NumPriorConc <- as.factor(erp\$NumPriorConc)

res <- data.frame()
for (yy in (1:(length(cond)))) {
data <- subset(erp, Condition == cond[yy])
if ((cond[yy]=="rGoC") | (cond[yy]=="rNgW")){
comp <- c("ERN_Amp","ERN_Lat","Pe_Amp","Pe_Lat")
} else if ((cond[yy]!="rGoC") | (cond[yy]!="rNgW")){
comp = c( "N2_Amp",  "N2_Lat",  "P3a_Amp","P3a_Lat", "P3b_Amp", "P3b_Lat")
}
res1 <- data.frame() #reset after each condition loop
for (zz in (1:(length(comp)))) {
compData <- subset(data,ERP_component == comp[zz])
electNum = compData\$electNum
if (length(unique(electNum))==1){
fm1 <- lmer(ERP_value~AthleteType + Time + AthleteType*Time + (1|id) ,data=compData)
} else if (length(unique(electNum))>=1){
fm1 <- lmer(ERP_value~AthleteType + NumPriorConc+ AthleteType*Time +AthleteType*NumPriorConc*Time  + (1|id) + (1|electNum),data=compData)
}
assign(paste("mod", cond[yy],gsub("_","",comp[zz]), sep="_"), fm1) #save it in the form of modsum_GoC_N2Amp
assign(paste("modsum", cond[yy],gsub("_","",comp[zz]), sep="_"), summary(fm1)) #save it in the form of modsum_GoC_N2Amp
tmod <- as.data.frame(summary(fm1)\$coefficients)
tmod <- round(tmod[-c(1), ],3)
tmod <- subset(tmod,`Pr(>|t|)`<= 0.1)
if (nrow(tmod)==0){
a <- 1
} else if (nrow(tmod)>=0){
Test <- rownames(tmod)
tmod <- cbind(Test,tmod)
rownames(tmod) <- NULL
#Component <- c(rep_len(paste(cond[yy],' ',comp[zz]),length.out = length(Test)))#comp[zz]
Component <- c(rep_len(paste(comp[zz]),length.out = length(Test)))#comp[zz]
tmod <- cbind(Component,tmod)
res <- rbind(res,tmod)
tmod\$`Pr(>|t|)`[ tmod\$`Pr(>|t|)` <0.001] <- "<0.001"
tmod\$`Pr(>|t|)` <- sub(".", ".", tmod\$`Pr(>|t|)`)
#tmod\$`t value` <- round(tmod\$`t value`,digits=2)
tmod\$Result <- paste0("t(", round(tmod\$df,0), ")=", sprintf("%.2f", tmod\$`t value`), ", p=", tmod\$`Pr(>|t|)`)
tmod <- subset(tmod,select=c("Component", "Test","Result"))
tmod\$Test <- gsub("AthleteType.L","Athlete Type",tmod\$Test)
tmod\$Test <- gsub(":","\$\\times\$", tmod\$Test, fixed=TRUE)
tmod\$Component <- gsub("_"," ",tmod\$Component)
tmod\$Test <- gsub("Time","Time ",tmod\$Test)
res1 <- rbind(res1,tmod)
}
assign(paste0(cond[yy],"statsTable"), res1) #save it in the form of modsum_GoC_N2Amp
}
}``````