# Chapter 6 T-Test (two-sample using groups)

This chapter provides generic code for carrying out a T-Test analysis (two sample using groups). It is recommended that you proceed through the sections in the order they appear. If, however, you want to do a quick analysis, I recommend loading the packages, prepping your data, and then running the ggstatsplot::ggbetweenstats command in section 6.6.

Placeholders that need replacing:

*mydata*– name of your dataset*groupvar*– name of your dichotomous grouping variable*intvar*– name of your interval or continuous variable*label*– any titles, axis labels, category labels

## 6.1 Packages Needed for T-Test

This code will check that required packages for this chapter are installed, install them if needed, and load them into your session.

```
req <- substitute(require(x, character.only = TRUE))
libs<-c("effsize", "ggplot2", "ggstatsplot", "patchwork", "moments", "ggpubr")
sapply(libs, function(x) eval(req) || {install.packages(x); eval(req)})
```

## 6.2 Data Prep for T-Test

One of the key preparations you need to make is to declare (classify) your categorical variable as a factor variable. In the generic commands below, the ‘class’ function tells you how R currently sees the variable (e.g., double, factor, character). The second command will reclassify the specified categorical variable as a factor variable.

```
class(mydata$groupvar) # Will tell you how R currently views the variable (double, factor…)
mydata$groupvar <- factor(mydata$groupvar) # Will declare the variable as a factor variable
```

## 6.3 Checking Data for Violations of Assumptions for T-Test

There are generally three assumptions researchers check when conducting ANOVA: 1) Relatively equal group sizes; 2) Relatively equal variances; 3) That the interval or continuous variable is normally distributed. The following generic commands offer insights on each.

### 6.3.1 Group Frequencies for T-Test

`table(mydata$groupvar)`

### 6.3.2 Checking for Equal Variances for T-Test

Group means and standard deviations (Note: the ‘aggregate’ and ‘by’ functions give you the same results, just in slightly different formats). The var.test function offers a more direct statistical test, performing an F test to compare the variances of the two groups.

```
aggregate(mydata$intvar, by = list(mydata$groupvar), FUN = mean, na.rm = TRUE)
aggregate(mydata$intvar, by = list(mydata$groupvar), FUN = sd, na.rm = TRUE)
by(mydata$intvar, mydata$groupvar, mean, na.rm = TRUE)
by(mydata$intvar, mydata$groupvar, sd, na.rm = TRUE)
var.test(intvar ~ groupvar, data = mydata)
```

### 6.3.3 Checking Normality for T-Test

The following generic commands can be used to check for skewness and kurtosis (a skewness of 0 and kurtosis of 3 are considered normal). You can visually inspect the data by looking at a density graph and quantile-quantile (qqplot, which draws a correlation between a sample and a normal distribution; the dots should form a relatively straight 45 degree line if htere is a normal distribution).

```
moments::skewness(mydata$intvar, na.rm = TRUE)
moments::kurtosis(mydata$intvar, na.rm = TRUE)
ggpubr::ggdensity(mydata$intvar, fill = "lightgray")
ggpubr::ggqqplot(mydata$intvar)
```

Beyond a visual inspection, you can conduct a Shapiro-Wilk’s test of normality, where a p < .05 indicates a non-normal distribution and a p > .05 indicates normally distributed data. This can be a sensitive test, particularly with a large N, so use in conjunction with other information. It is also limited to a sample of 5000.

`shapiro.test(mydata$intvar)`

## 6.4 T-Test Command (two sample test of group means) and Effect Size

Note that the default is for R to treat the groups as having unequal variances. If this assumption has not been violated, then you may set var.equal = TRUE, as shown below.

```
t.test(intvar ~ groupvar, data = mydata) # The default is for unequal variances
t.test(intvar ~ groupvar, data = mydata, var.equal = TRUE)
```

To calculate Cohen’s d, a measure of effect size, run the following command. In general, the characterizations of effect size are: |0-.2| (small); |.2-.5| (moderate); |>.5| (large)

`effsize::cohen.d(mydata$intvar, mydata$groupvar)`

## 6.5 Wilcoxon/Mann-Whitney Rank Sum Test

If you have violated any of the assumptions for t-test, you can run the Wilcoxon/Mann-Whitney Rank Sum Test, a non-parametric test of association.

`wilcox.test(intvar ~ groupvar, data = mydata)`

## 6.6 Graphing Options for T-Test

Boxplots and/or violin plots are probably the most helpful graphs to complement a t-test. The ggplot2 package is the go-to package for most graphing. The ggstatsplot package and ggbetweenstats function, however, provide a one-stop shop for graphing, displaying means, and conducting a t-test. I actually recommend starting with the the ggstatsplot option. You can find helpful webpages on ggstatsplot::ggstatsbetween here, here, and here.

I also encourage you to check out the R Graph Gallery, a website that showcases different graphs and provides their associated code.

Some important arguments that can be change in ggbetweenstats include:

- plot.type = # options include “box,” “violin,” or the default “boxviolin”
- pairwise.comparisons = # Can be set to TRUE or FALSE
- var.equal = # Can be set to TRUE or FALSE (the default)
- mean.ci = # Can be set to TRUE or FALSE (the default)
- effsize.type = # Can be set to “d” for t-tests

```
ggstatsplot::ggbetweenstats(data = mydata, x = groupvar, y = intvar,
effsize.type = "d", mean.ci = TRUE,
pairwise.comparisons = TRUE,
messages = FALSE, bf.message = FALSE,
title = "Title of Graph", xlab = "X-axis label", ylab = "Y-axis label")
```

This ggplot command produces a basic boxplot.

```
ggplot2::ggplot(data = mydata, aes(x = groupvar, y = intvar)) +
geom_boxplot() +
labs(title = "Title of Graph", x = "X-axis label", y = "Y-axis label")
```

This version of the ggplot command includes a subcommand for plotting the means on top of the boxplots.

```
ggplot2::ggplot(data = mydata, aes(x = groupvar, y = intvar)) +
geom_boxplot() + stat_summary(fun.y = mean, geom = "point", shape = 8, size = 4, color = "blue", fill = "blue") +
labs(title = "Title of Graph", x = "X-axis label", y = "Y-axis label")
```

This version of the gpplot command includes a subcommand for overlaying a violin plot atop the boxplot.

```
ggplot2::ggplot(data = mydata, aes(x = groupvar, y = intvar)) +
geom_boxplot() + geom_violin() +
labs(title = "Title of Graph", x = "X-axis label", y = "Y-axis label")
```

Lastly, you may consider “patching” your desired graphs together with the patchwork. You’d only need to assign each graph to an object to do so. Here is an example of patching a bar graph and a ggstatsplot graph together (ANOVA example).

## 6.7 Consolidated Code for T-Test

Below is the consolidated code from this chapter. One could transfer this code into an empty RScript, which also offers the option of find/replace terms. You can also download this generic t-test RScript file here.

Placeholders that need replacing:

*mydata*– name of your dataset*groupvar*– name of your categorical grouping variable*intvar*– name of your interval or continuous variable*object*– whatever you want to call your object(s)), if you generate any*labels/title*– any titles, axis labels, category labels

```
# 5.1 Packages Needed
req <- substitute(require(x, character.only = TRUE))
libs<-c("effsize", "ggplot2", "ggstatsplot", "patchwork", "moments", "ggpubr")
sapply(libs, function(x) eval(req) || {install.packages(x); eval(req)})
# 5.2 Prep Data
class(mydata$groupvar)
mydata$groupvar <- factor(mydata$groupvar)
# 5.3 Checking data for violations of assumptions:
## Group frequencies:
table(mydata$groupvar)
## Group means and standard deviations. Either the "aggregate" or "by" command works
aggregate(mydata$intvar, by = list(mydata$groupvar), FUN = mean, na.rm = TRUE)
aggregate(mydata$intvar, by = list(mydata$groupvar), FUN = sd, na.rm = TRUE)
by(mydata$intvar, mydata$groupvar, mean, na.rm = TRUE)
by(mydata$intvar, mydata$groupvar, sd, na.rm = TRUE)
## Test of equal variances across groups
var.test(intvar ~ groupvar, data = mydata)
## Check for Normality
moments::skewness(mydata$intvar, na.rm = TRUE)
moments::kurtosis(mydata$intvar, na.rm = TRUE)
ggpubr::ggdensity(mydata$intvar, fill = "lightgray")
ggpubr::ggqqplot(mydata$intvar)
shapiro.test(mydata$intvar)
# 5.4 T-Test command (two sample test of group means) and Cohen's d
t.test(intvar ~ groupvar, data = mydata) # The default is for unequal variances
t.test(intvar ~ groupvar, data = mydata, var.equal = TRUE)
effsize::cohen.d(mydata$intvar, mydata$groupvar)
# 5.5 Wilcoxon/Mann-Whitney Rank Sum Test (non-parametric)
wilcox.test(intvar ~ groupvar, data = mydata)
# 5.6 Graphing Options
## ggstatsplot::ggbetweeenstats is my "go to" option
ggstatsplot::ggbetweenstats(data = mydata, x = groupvar, y = intvar,
effsize.type = "d", mean.ci = TRUE,
pairwise.comparisons = TRUE,
messages = FALSE, bf.message = FALSE,
title = "Title of Graph", xlab = "X-axis label", ylab = "Y-axis label")
## ggplot commands
ggplot2::ggplot(data = mydata, aes(x = groupvar, y = intvar)) +
geom_boxplot() +
labs(title = "Title of Graph", x = "X-axis label", y = "Y-axis label")
ggplot2::ggplot(data = mydata, aes(x = groupvar, y = intvar)) +
geom_boxplot() + stat_summary(fun.y = mean, geom = "point",
shape = 8, size = 4, color = "blue", fill = "blue") +
labs(title = "Title of Graph", x = "X-axis label", y = "Y-axis label")
ggplot2::ggplot(data = mydata, aes(x = groupvar, y = intvar)) +
geom_boxplot() + geom_violin() +
labs(title = "Title of Graph", x = "X-axis label", y = "Y-axis label")
```