# Chapter 6 T-Test (two-sample using groups)

This chapter provides generic code for carrying out a T-Test analysis (two sample using groups). It is recommended that you proceed through the sections in the order they appear. If, however, you want to do a quick analysis, I recommend loading the packages, prepping your data, and then running the ggstatsplot::ggbetweenstats command in section 6.6.

Placeholders that need replacing:

• mydata – name of your dataset
• groupvar – name of your dichotomous grouping variable
• intvar – name of your interval or continuous variable
• label – any titles, axis labels, category labels

## 6.1 Packages Needed for T-Test

This code will check that required packages for this chapter are installed, install them if needed, and load them into your session.

``````req <- substitute(require(x, character.only = TRUE))
libs<-c("effsize", "ggplot2", "ggstatsplot", "patchwork", "moments", "ggpubr")
sapply(libs, function(x) eval(req) || {install.packages(x); eval(req)})``````

## 6.2 Data Prep for T-Test

One of the key preparations you need to make is to declare (classify) your categorical variable as a factor variable. In the generic commands below, the ‘class’ function tells you how R currently sees the variable (e.g., double, factor, character). The second command will reclassify the specified categorical variable as a factor variable.

``````class(mydata\$groupvar) # Will tell you how R currently views the variable (double, factor…)
mydata\$groupvar <- factor(mydata\$groupvar) # Will declare the variable as a factor variable``````

## 6.3 Checking Data for Violations of Assumptions for T-Test

There are generally three assumptions researchers check when conducting ANOVA: 1) Relatively equal group sizes; 2) Relatively equal variances; 3) That the interval or continuous variable is normally distributed. The following generic commands offer insights on each.

### 6.3.1 Group Frequencies for T-Test

``table(mydata\$groupvar)``

### 6.3.2 Checking for Equal Variances for T-Test

Group means and standard deviations (Note: the ‘aggregate’ and ‘by’ functions give you the same results, just in slightly different formats). The var.test function offers a more direct statistical test, performing an F test to compare the variances of the two groups.

``````aggregate(mydata\$intvar, by = list(mydata\$groupvar), FUN = mean, na.rm = TRUE)
aggregate(mydata\$intvar, by = list(mydata\$groupvar), FUN = sd, na.rm = TRUE)

by(mydata\$intvar, mydata\$groupvar, mean, na.rm = TRUE)
by(mydata\$intvar, mydata\$groupvar, sd, na.rm = TRUE)

var.test(intvar ~ groupvar, data = mydata)``````

### 6.3.3 Checking Normality for T-Test

The following generic commands can be used to check for skewness and kurtosis (a skewness of 0 and kurtosis of 3 are considered normal). You can visually inspect the data by looking at a density graph and quantile-quantile (qqplot, which draws a correlation between a sample and a normal distribution; the dots should form a relatively straight 45 degree line if htere is a normal distribution).

``````moments::skewness(mydata\$intvar, na.rm = TRUE)
moments::kurtosis(mydata\$intvar, na.rm = TRUE)
ggpubr::ggdensity(mydata\$intvar, fill = "lightgray")
ggpubr::ggqqplot(mydata\$intvar)``````

Beyond a visual inspection, you can conduct a Shapiro-Wilk’s test of normality, where a p < .05 indicates a non-normal distribution and a p > .05 indicates normally distributed data. This can be a sensitive test, particularly with a large N, so use in conjunction with other information. It is also limited to a sample of 5000.

``shapiro.test(mydata\$intvar)``

## 6.4 T-Test Command (two sample test of group means) and Effect Size

Note that the default is for R to treat the groups as having unequal variances. If this assumption has not been violated, then you may set var.equal = TRUE, as shown below.

``````t.test(intvar ~ groupvar, data = mydata) # The default is for unequal variances
t.test(intvar ~ groupvar, data = mydata, var.equal = TRUE)``````

To calculate Cohen’s d, a measure of effect size, run the following command. In general, the characterizations of effect size are: |0-.2| (small); |.2-.5| (moderate); |>.5| (large)

``effsize::cohen.d(mydata\$intvar, mydata\$groupvar)``

## 6.5 Wilcoxon/Mann-Whitney Rank Sum Test

If you have violated any of the assumptions for t-test, you can run the Wilcoxon/Mann-Whitney Rank Sum Test, a non-parametric test of association.

``wilcox.test(intvar ~ groupvar, data = mydata)``

## 6.6 Graphing Options for T-Test

Boxplots and/or violin plots are probably the most helpful graphs to complement a t-test. The ggplot2 package is the go-to package for most graphing. The ggstatsplot package and ggbetweenstats function, however, provide a one-stop shop for graphing, displaying means, and conducting a t-test. I actually recommend starting with the the ggstatsplot option. You can find helpful webpages on ggstatsplot::ggstatsbetween here, here, and here.

I also encourage you to check out the R Graph Gallery, a website that showcases different graphs and provides their associated code.

Some important arguments that can be change in ggbetweenstats include:

• plot.type = # options include “box,” “violin,” or the default “boxviolin”
• pairwise.comparisons = # Can be set to TRUE or FALSE
• var.equal = # Can be set to TRUE or FALSE (the default)
• mean.ci = # Can be set to TRUE or FALSE (the default)
• effsize.type = # Can be set to “d” for t-tests
``````ggstatsplot::ggbetweenstats(data = mydata, x = groupvar, y = intvar,
effsize.type = "d", mean.ci = TRUE,
pairwise.comparisons = TRUE,
messages = FALSE, bf.message = FALSE,
title = "Title of Graph", xlab = "X-axis label", ylab = "Y-axis label")``````

This ggplot command produces a basic boxplot.

``````ggplot2::ggplot(data = mydata, aes(x = groupvar, y = intvar)) +
geom_boxplot() +
labs(title = "Title of Graph", x = "X-axis label", y = "Y-axis label")``````

This version of the ggplot command includes a subcommand for plotting the means on top of the boxplots.

``````ggplot2::ggplot(data = mydata, aes(x = groupvar, y = intvar)) +
geom_boxplot() + stat_summary(fun.y = mean, geom = "point", shape = 8, size = 4, color = "blue", fill = "blue") +
labs(title = "Title of Graph", x = "X-axis label", y = "Y-axis label")``````

This version of the gpplot command includes a subcommand for overlaying a violin plot atop the boxplot.

``````ggplot2::ggplot(data = mydata, aes(x = groupvar, y = intvar)) +
geom_boxplot() + geom_violin() +
labs(title = "Title of Graph", x = "X-axis label", y = "Y-axis label")``````

Lastly, you may consider “patching” your desired graphs together with the patchwork. You’d only need to assign each graph to an object to do so. Here is an example of patching a bar graph and a ggstatsplot graph together (ANOVA example).

## 6.7 Consolidated Code for T-Test

Below is the consolidated code from this chapter. One could transfer this code into an empty RScript, which also offers the option of find/replace terms. You can also download this generic t-test RScript file here.

Placeholders that need replacing:

• mydata – name of your dataset
• groupvar – name of your categorical grouping variable
• intvar – name of your interval or continuous variable
• object – whatever you want to call your object(s)), if you generate any
• labels/title – any titles, axis labels, category labels
``````# 5.1 Packages Needed

req <- substitute(require(x, character.only = TRUE))
libs<-c("effsize", "ggplot2", "ggstatsplot", "patchwork", "moments", "ggpubr")
sapply(libs, function(x) eval(req) || {install.packages(x); eval(req)})

# 5.2 Prep Data

class(mydata\$groupvar)
mydata\$groupvar <- factor(mydata\$groupvar)

# 5.3 Checking data for violations of assumptions:

## Group frequencies:

table(mydata\$groupvar)

## Group means and standard deviations. Either the "aggregate" or "by" command works

aggregate(mydata\$intvar, by = list(mydata\$groupvar), FUN = mean, na.rm = TRUE)
aggregate(mydata\$intvar, by = list(mydata\$groupvar), FUN = sd, na.rm = TRUE)
by(mydata\$intvar, mydata\$groupvar, mean, na.rm = TRUE)
by(mydata\$intvar, mydata\$groupvar, sd, na.rm = TRUE)

## Test of equal variances across groups

var.test(intvar ~ groupvar, data = mydata)

## Check for Normality

moments::skewness(mydata\$intvar, na.rm = TRUE)
moments::kurtosis(mydata\$intvar, na.rm = TRUE)
ggpubr::ggdensity(mydata\$intvar, fill = "lightgray")
ggpubr::ggqqplot(mydata\$intvar)

shapiro.test(mydata\$intvar)

# 5.4 T-Test command (two sample test of group means) and Cohen's d

t.test(intvar ~ groupvar, data = mydata) # The default is for unequal variances
t.test(intvar ~ groupvar, data = mydata, var.equal = TRUE)

effsize::cohen.d(mydata\$intvar, mydata\$groupvar)

# 5.5 Wilcoxon/Mann-Whitney Rank Sum Test (non-parametric)

wilcox.test(intvar ~ groupvar, data = mydata)

# 5.6 Graphing Options

## ggstatsplot::ggbetweeenstats is my "go to" option

ggstatsplot::ggbetweenstats(data = mydata, x = groupvar, y = intvar,
effsize.type = "d", mean.ci = TRUE,
pairwise.comparisons = TRUE,
messages = FALSE, bf.message = FALSE,
title = "Title of Graph", xlab = "X-axis label", ylab = "Y-axis label")

## ggplot commands

ggplot2::ggplot(data = mydata, aes(x = groupvar, y = intvar)) +
geom_boxplot() +
labs(title = "Title of Graph", x = "X-axis label", y = "Y-axis label")

ggplot2::ggplot(data = mydata, aes(x = groupvar, y = intvar)) +
geom_boxplot() + stat_summary(fun.y = mean, geom = "point",
shape = 8, size = 4, color = "blue", fill = "blue") +
labs(title = "Title of Graph", x = "X-axis label", y = "Y-axis label")

ggplot2::ggplot(data = mydata, aes(x = groupvar, y = intvar)) +
geom_boxplot() + geom_violin() +
labs(title = "Title of Graph", x = "X-axis label", y = "Y-axis label")
``````