4 Statistics in R

4.1 T-Tests

4.1.1 Independent Means T-Test

The independent t test is used to test for differences in the mean of two seperate groups of individuals (Gravetter & Wallnau, 2019).

For the Independent Means T-Test in this section, we will be using a fake data set of a sample of 200 undergraduate students’ math test scores. The test scores are on a scale of 0 to 100, and each individual has been assigned to either the Paper Test format or the Electronic Test format in the TestFormat condition, where 1=paper test and -1=electronic test.The following lines of code create our data set and set up the data frames we will need.

Before continuing with this section, the following packages must be installed a loaded in order to successfully run all of the functions listed.

4.1.1.1 Assumptions

There are five assumptions that we should meet in order to conduct this t-test:

  • Having an itnterval or ratio scale of measurement
  • Using a random sampling from a defined population
  • The samples are independent; no overlap between group members
  • The scores are normally distributed in the population
  • There is homogeneity of variance

The first three assumptions can be checked simply by looking at our data. We have a ratio scale of measurement, it is a random sample, and the samples are independent. The normality assumption can be checked using the Shapiro-Wilk test.

## 
##  Shapiro-Wilk normality test
## 
## data:  Grade
## W = 0.99411, p-value = 0.6175

As we can see here, the p-value for the Shapiro-Wilk test was 0.6175, which indicates that there was no violation of normality.

The homogeneity of variance assumption can be tests using either Levene’s test for homogeneity of variance or Bartlett’s test. Notice that in the code for the Levene’s test we used as.factor. This is because our data is set up as contrast codes, but Levene’s test needs groups!

## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value  Pr(>F)  
## group   1  4.8234 0.02924 *
##       198                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Bartlett test of homogeneity of variances
## 
## data:  Data$Grade and Data$TestFormat
## Bartlett's K-squared = 6.0182, df = 1, p-value =
## 0.01416

As we can see here, the p-value for the Levene’s test was 0.029 and the Bartlett’s test was 0.014, both of which indicate a violation of the assumption. However, for the sake of the example, we will continue with conducting the analysis.

4.1.1.2 Test Statistic

In order to compute the Independent Mean T-Test, we can use the summary() and lm() functions or the t.test() function. For both of these, type the dependent variable first and then the independent variable.

## 
## Call:
## lm(formula = Grade ~ TestFormat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -30.405  -6.993  -0.112   7.070  34.498 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  67.6210     0.8298  81.492  < 2e-16 ***
## TestFormat    2.4081     0.8298   2.902  0.00413 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 11.74 on 198 degrees of freedom
## Multiple R-squared:  0.0408, Adjusted R-squared:  0.03596 
## F-statistic: 8.422 on 1 and 198 DF,  p-value: 0.004127
## 
##  Welch Two Sample t-test
## 
## data:  Grade by TestFormat
## t = -2.9021, df = 186.92, p-value = 0.004153
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -8.090113 -1.542279
## sample estimates:
## mean in group -1  mean in group 1 
##         65.21293         70.02913

In this output we can see that the t value is -2.902, and our p value is less than .01, meaning we have a significant result.

We can also use the confint() and lm() functions to calculate a confidence interval for the test.

##                 2.5 %   97.5 %
## (Intercept) 65.984666 69.25739
## TestFormat   0.771736  4.04446

This gives us a 95% confidence interval of 65.984 to 69.257.

Once we have calulated the t value and p-value, we can then calculate the effect size, or Choen’s d, using the following code:

## 
## Cohen's d
## 
## d estimate: -0.4104125 (small)
## 95 percent confidence interval:
##      lower      upper 
## -0.6922185 -0.1286064

Our Cohen’s d was 0.41, which is a small effect size based on Cohen’s conventions (Need citation)

We can also calculate the power for the obtained effect size using the pwr.t.test() function from the pwr package.

## 
##      Two-sample t test power calculation 
## 
##               n = 100
##               d = 0.4104125
##       sig.level = 0.05
##           power = 0.8232934
##     alternative = two.sided
## 
## NOTE: n is number in *each* group

This shows us our power is 94%.

In the pwr.t.test function you must enter values for at least three of the following four arguments:

  • n = number in each group
  • d = Cohen’s d
  • sig.level = alpha from T-Test
  • power = desired power

As well as the type= argument, which indicates whether you are using a "two.sample", "one.sample", or "paired" t-test. Whichever argument you leave blank will be the value that the function calculates based on the other values. In our example, we wanted to determine the observed power so we left it blank and provided the number in each group, Cohen’s d, and alpha.

Once you have finished these steps, you can move on to writing up and reporting your results.

4.2 Chi Square

Chi Square tests are non-parametric, or distribution free, tests that compare the proportions observed in our sample verses the proportions expected. There are two different Chi Square tests: the Goodness of Fit test and the Test of Independence.

4.2.1 Chi Square Goodness of Fit

The Chi Square Goodness of Fit test is used to test the proportions obtained from a sample against the null hypothesis about the corresponding proportions in the population (Gravetter & Wallnau, 2019).

For this section, we will be creating a fake data set to work with. In this scenario, We asked 100 students if their favorite subject is math, science, or reading. Each student was only allowed to pick one favorite and it had to be one of these three. This resulted in 55 students choosing math, 15 chosing science, and remaining 30 selecting reading. The following code creates this data.

Before continuing with this section, the pwr package must be installed and loaded in order to successfully run all of the functions listed.

4.2.1.1 Assumptions

There are four assumptions that must be checked prior to conducting a goodness of fit test:

  • There is one categorical variable
  • Indepenence of observations
  • Mutually exculisive groups
  • At least 5 expected frequencies in each group

The first three assumptions can be checked simply by looking at our data. There is a categorical variable (subject), no observation influences another, and no observation exsits in more than one group. The final assumption can be checked by calculating the expected values for the groups using the following code.

Here we can see that the expected value for each group is 33.33.

Note about expected values

Expected values often have decimals which are not possible when observing count data. This is to be expected, and will not impact the results of the Chi Square test.

4.2.1.2 Test Statistic

In order to compute the Chi Square Goodness of Fit test, we can use the chisq.test() function from the stats package.

## 
##  Chi-squared test for given probabilities
## 
## data:  counts
## X-squared = 24.5, df = 2, p-value = 4.785e-06

In this output we can see that the Chi Square statistic is 24.5, and our p value is less than .001, meaning we have a significant result.

Once we have calulated the Chi Square and p value, we can then calculate the effect size, or Phi coefficient, and the power for the test. These can be done using the following code:

## [1] 0.4949747
## 
##      Chi squared power calculation 
## 
##               w = 0.4949747
##               N = 100
##              df = 2
##       sig.level = 0.05
##           power = 0.9959142
## 
## NOTE: N is the number of observations

This shows us our Phi is .495 and the power is 94%. Once you have finished these steps, you can move on to writing up and reporting your results.

4.2.2 Chi Square Test of Independence

The Chi Square Test of Independence is used to test frequency data obtained from a sample to evaluate the relationship between two variables in the population (Gravetter & Wallnau, 2019).

For this section, we will be creating a fake data set to work with. In this scenario, We asked 100 chemistry students if they had ever taken a biology course (yes, no) and if they had ever taken a statistics course (yes, no). The results showed that 38 students had taken both a biology and statistics course, 13 had only taken biology, 3 had only taken statsitics, and 46 had not taken either. The following code creates this data.

##             Statistics Yes Statistics No
## Biology Yes             38            13
## Biology No               3            46

Before continuing with this section, the pwr package must be installed and loaded in order to successfully run all of the functions listed.

4.2.2.1 Assumptions

There are four assumptions that must be checked prior to conducting a goodness of fit test:

  • Indepenence of observations
  • Mutually exculisive response categories
  • At least 5 expected frequencies in each group

The first two assumptions can be checked simply by looking at our data. No observation influences another and no observation exsits in more than one group within a variable. The final assumption can be checked by calculating the expected values for the groups using the following code.

Here we can see that the expected value for each group is 20.09 or greater.

Note about expected values

Expected values often have decimals which are not possible when observing count data. This is to be expected, and will not impact the results of the Chi Square test.

4.2.2.2 Test Statistic

In order to compute the Chi Square Test of Independence, we can use the chisq.test() function from the stats package.

## 
##  Pearson's Chi-squared test
## 
## data:  classes
## X-squared = 48.315, df = 1, p-value = 3.63e-12

In this output we can see that the Chi Square statistic is 48.315, and our p value is less than .001, meaning we have a significant result.

Once we have calulated the Chi Square and p value, we can then calculate the effect size, or Phi coefficient, and the power for the test. These can be done using the following code:

## [1] 0.6950899
## 
##      Chi squared power calculation 
## 
##               w = 0.6950899
##               N = 100
##              df = 1
##       sig.level = 0.05
##           power = 0.9999997
## 
## NOTE: N is the number of observations

This shows us our Phi is .69 and the power is 99%. Once you have finished these steps, you can move on to writing up and reporting your results.

4.3 Analysis of Variance (ANOVA)

4.3.1 One-Way Between Subjects ANOVA

The One-Way Between Subjects Analysis of variance (ANOVA) is used to test for differences in the means of two or more groups (Gravetter & Wallnau, 2019).

For the One-way Between Subjects ANOVA in this section, we will be using a fake data set of a sample of 300 undergraduate students’ math test scores. The test scores are on a scale of 0 to 100. Each individual has also been assigned to either test 1, 2, or 3 the TestNumber condition. The following lines of code create our data set and set up the data frame we will need.

Before continuing with this section, the following package must be installed a loaded in order to successfully run all of the functions listed.

4.3.1.1 Assumptions

There are four assumptions that we should meet in order to conduct this ANOVA:

  • Having an itnterval or ratio scale of measurement
  • The samples are independent; no overlap between group members
  • The scores are normally distributed in the population
  • There is homogeneity of variance

The first two assumptions can be checked simply by looking at our data. We have a ratio scale of measurement and the samples are independent. The normality assumption can be checked using the Shapiro-Wilk test on each group.

## 
##  Shapiro-Wilk normality test
## 
## data:  Data$Grades[which(Data$TestNumber == 1)]
## W = 0.98836, p-value = 0.535
## 
##  Shapiro-Wilk normality test
## 
## data:  Data$Grades[which(Data$TestNumber == 2)]
## W = 0.99158, p-value = 0.7902
## 
##  Shapiro-Wilk normality test
## 
## data:  Data$Grades[which(Data$TestNumber == 3)]
## W = 0.98164, p-value = 0.1779

As we can see here, the p-value for the Shapiro-Wilk tests were 0.535, 0.79, and 0.178, which indicates no violation of the assumption.

The homogeneity of variance assumption can be tested using either Levene’s test for homogeneity of variance or Bartlett’s test. Notice that in the code for the Levene’s test we used as.factor. This is because our data is set up as contrast codes, but Levene’s test needs groups!

## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value    Pr(>F)    
## group   2  52.476 < 2.2e-16 ***
##       297                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Bartlett test of homogeneity of variances
## 
## data:  Data$Grades and Data$TestNumber
## Bartlett's K-squared = 229.9, df = 2, p-value < 2.2e-16

As we can see here, the p-value for both the Levene’s test and Bartlett’s test were less than .001, which indicates a violation of the assumption. However, for the sake of the example, we will continue with conducting the analysis.

4.3.1.2 Test Statistic

In order to compute the ANOVA, we will use the anova() and lm() functions together. Within the lm() function, type the dependent variable first and then the independent variable.

## Analysis of Variance Table
## 
## Response: Data$Grades
##                  Df Sum Sq Mean Sq F value    Pr(>F)    
## Data$TestNumber   1  20484 20484.4  143.37 < 2.2e-16 ***
## Residuals       298  42577   142.9                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In this output we can see that the F value is 143.37, and our p value is less than .001, meaning we have a significant result.

Once we have calulated the F value and p-value, we can then calculate the effect size, or R squared, using the following code:

## [1] 0.3248318

Our R squared was 0.325, which means that 32.5% of the variation in grades is explained by test number.

4.3.1.3 Post Hoc Test

Finally, because our ANOVA was significant, we need to conduct post hoc tests to determine which groups were significantly different from each other. This can be done using Fisher’s least significant difference (LSD) test. The LSD.test function is located in the agricolae package.

## 
## Study: Data$Grades ~ Data$TestNumber
## 
## LSD t Test for Data$Grades 
## 
## Mean Square Error:  20484.4 
## 
## Data$TestNumber,  means and individual ( 95 %) CI
## 
##   Data.Grades       std   r      LCL       UCL      Min
## 1    70.02913 10.207104 100 41.86300  98.19525 47.28075
## 2    65.21293 13.085722 100 37.04680  93.37906 34.80762
## 3    90.26988  2.101132 100 62.10375 118.43601 83.34755
##        Max
## 1 95.81959
## 2 99.71093
## 3 96.64816
## 
## Alpha: 0.05 ; DF Error: 298
## Critical Value of t: 1.967957 
## 
## least Significant Difference: 39.83292 
## 
## Treatments with the same letter are not significantly different.
## 
##   Data$Grades groups
## 3    90.26988      a
## 1    70.02913      a
## 2    65.21293      a

This test indicates that there are no significant differences between any of the groups.

Once you have finished these steps, you can move on to writing up and reporting your results.

4.3.2 One-Way Within Subjects ANOVA

The One-Way Within Subjects Analysis of variance (ANOVA) is used to test for mean differences between two or more groups that contain the same individuals (Gravetter & Wallnau, 2019).

For the One-way Within Subjects ANOVA in this section, we will be using a fake data set of a sample of 100 undergraduate students’ math, reading, and science test scores. The test scores are on a scale of 0 to 100. Each subject has also been assigned to either test 1 (math), 2 (reading), or 3 (science) the TestSubject condition. The following code will create our data set.

We will also create subsets of factor level groupings to get group summaries of our data and check assumptions.

Before continuing with this section, the following package must be installed a loaded in order to successfully run all of the functions listed.

4.3.2.1 Assumptions

There are four assumptions that we should meet in order to conduct this ANOVA:

  • Normality of sampling distributions
  • Normality of dependent variable
  • Homogeneity of variance
  • Sphericity

The normality of sampling distributions assumptions is a theorectical idea that is typically viewed as met if the degrees of freedom is greater or equal to 20 when there is only one IV. The next assumption, normality of the dependent variable, can be checked using the Shapiro-Wilk test on each group.

## 
##  Shapiro-Wilk normality test
## 
## data:  MathGrades$Grades
## W = 0.98836, p-value = 0.535
## 
##  Shapiro-Wilk normality test
## 
## data:  ReadingGrades$Grades
## W = 0.99158, p-value = 0.7902
## 
##  Shapiro-Wilk normality test
## 
## data:  ScienceGrades$Grades
## W = 0.98164, p-value = 0.1779

As we can see here, the p-value for the Shapiro-Wilk tests were 0.535, 0.79, and 0.178, which indicates no violation of the assumption.

The homogeneity of variance assumption can be tested using the Levene’s test for homogeneity of variance. Notice that in the code for the Levene’s test we used as.factor. This is because our data is set up as contrast codes, but Levene’s test needs groups!

## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value    Pr(>F)    
## group   2  52.476 < 2.2e-16 ***
##       297                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

As we can see here, the p-value for the Levene’s test was less than .001, which indicates a violation of the assumption. However, for the sake of the example, we will continue with conducting the analysis.

4.3.2.2 Test Statistic

In order to compute the ANOVA, we will use the aov() and summary() functions. Within the aov() function, type the dependent variable first and then the independent variable + the error adjustement.

## 
## Call:
## aov(formula = Grades ~ TestSubject + Error(Student/TestSubject), 
##     data = Data)
## 
## Grand Mean: 75.17064
## 
## Stratum 1: Student
## 
## Terms:
##                 Residuals
## Sum of Squares   138.0989
## Deg. of Freedom         1
## 
## Residual standard error: 11.75155
## 
## Stratum 2: Student:TestSubject
## 
## Terms:
##                 TestSubject
## Sum of Squares     16300.66
## Deg. of Freedom           1
## 
## Estimated effects are balanced
## 
## Stratum 3: Within
## 
## Terms:
##                 TestSubject Residuals
## Sum of Squares      4231.16  42391.64
## Deg. of Freedom           1       296
## 
## Residual standard error: 11.96725
## Estimated effects are balanced
## 
## Error: Student
##           Df Sum Sq Mean Sq F value Pr(>F)
## Residuals  1  138.1   138.1               
## 
## Error: Student:TestSubject
##             Df Sum Sq Mean Sq
## TestSubject  1  16301   16301
## 
## Error: Within
##              Df Sum Sq Mean Sq F value   Pr(>F)    
## TestSubject   1   4231    4231   29.54 1.14e-07 ***
## Residuals   296  42392     143                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In this output we can see that the F value is 29.54, and our p value is less than .001, meaning we have a significant result.

4.3.2.3 Post Hoc Test

Finally, because our ANOVA was significant, we need to conduct post hoc tests to determine which groups were significantly different from each other. This can be done using t-test with the Bonferroni correction.

## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  Data$Grades and Data$TestSubject 
## 
##   1      2     
## 2 0.0015 -     
## 3 <2e-16 <2e-16
## 
## P value adjustment method: bonferroni

This test indicates that group 1 was significantly different from groups 2 and 3, and group 2 was significantly different from group 3. Once you have finished these steps, you can move on to writing up and reporting your results.

4.4 Factorial Between Subjects ANOVA

The Factorial Between Subjects Analysis of variance (ANOVA) is used to test for differences in the means of two or more groups across two or more independent variables at once (Gravetter & Wallnau, 2019). This allows us to determine main effects for each independent variable as well as the interaction effects for the combination of independent variables.

For the Factorial Between Subjects ANOVA in this section, we will be using a fake data set of a sample of 100 undergraduate students’ math test scores. The test scores are on a scale of 0 to 100. Each individual has also been assigned to either the Paper Test format or the Electronic Test format in the TestFormat condition and either the Classroom setting or Home setting in the TestLocation condition. The following lines of code create our data set and set up the data frame we will need.

We will also want our data set stored in short form for some of our analyses along the way.

Before continuing with this section, the following package must be installed a loaded in order to successfully run all of the functions listed.

4.4.0.1 Assumptions

There are four assumptions that we should meet in order to conduct this ANOVA:

  • Independence of errors
  • Normality of sampling distribution
  • Normality of dependent variable
  • Homogeneity of variance

The first assumption can be checked simply by looking at our data and seeing the samples are independent. The next two assumptions on normality can be checked using the Shapiro-Wilk test on the dependent variable as a whole and for each group.

## 
##  Shapiro-Wilk normality test
## 
## data:  DataShort$Home.Paper
## W = 0.98725, p-value = 0.983
## 
##  Shapiro-Wilk normality test
## 
## data:  DataShort$Home.Electronic
## W = 0.95302, p-value = 0.293
## 
##  Shapiro-Wilk normality test
## 
## data:  DataShort$Classroom.Paper
## W = 0.94172, p-value = 0.1622
## 
##  Shapiro-Wilk normality test
## 
## data:  DataShort$Classroom.Electronic
## W = 0.97231, p-value = 0.704
## 
##  Shapiro-Wilk normality test
## 
## data:  Data$Grade
## W = 0.99158, p-value = 0.7902

Notice that we used the short form to check the inidvidual groups and the long form to check it all at once. As we can see here, the p-value for the Shapiro-Wilk tests were all greater than .05, which indicates no violation of the assumptions.

The homogeneity of variance assumption can be tested using the Levene’s test for homogeneity of variance.

## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value Pr(>F)
## group  3  0.1715 0.9154
##       96

As we can see here, the p-value for both the Levene’s test was greater than .05, which indicates no violation of the assumption.

4.4.0.2 Test Statistic

We have two options to compute the ANOVA. We can use the anova() and lm() functions or the aov() and summary() functions together. Within the lm() and aov() functions, type the dependent variable first and then the first independent variable * the second independent variable.

## Analysis of Variance Table
## 
## Response: Grade
##                         Df  Sum Sq Mean Sq F value  Pr(>F)  
## TestLocation             1   517.6  517.62  3.0432 0.08428 .
## TestFormat               1   101.5  101.50  0.5967 0.44173  
## TestLocation:TestFormat  1     4.8    4.76  0.0280 0.86752  
## Residuals               96 16328.5  170.09                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##                         Df Sum Sq Mean Sq F value Pr(>F)  
## TestLocation             1    518   517.6   3.043 0.0843 .
## TestFormat               1    101   101.5   0.597 0.4417  
## TestLocation:TestFormat  1      5     4.8   0.028 0.8675  
## Residuals               96  16329   170.1                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In both outputs we can see that the F value is 3.04, and our p value is greater than .05, meaning there is no significant result.

Even though we failed to find a significant result, for the sake of the example, we will continue with calculating the effect size, or partial R squared, using the following code:

## [1] 0.03072624
## [1] 0.006177434
## [1] 0.0002913221

Our partial R squared values for the location, format, and interaction effect were 0.03, 0.006, and 0.0002, respoectively. Once you have finished these steps, you can move on to writing up and reporting your results.


References

Gravetter, F. J., & Wallnau, L. B. (2019). Statistics for the behavioral sciences (10th ed.). Cengage Learning Asia Pte Ltd.