Chapter 13 Apr 12–13: Chi-Square Non-parametric Test

This chapter is ready for use in HE802 in spring 2021.

This week, our goals are to…

  1. Identify research scenarios where chi-square analyses would be the most appropriate

  2. Identify the null hypotheses for chi-square and non-parametric tests

  3. Identify research scenarios where non-parametric tests would be most appropriate

  4. Create a contingency table

  5. Conduct and interpret the results of Chi-square and non-parametric analyses

  6. Explain the similarities and difference between chi-square tests of independence and goodness-of-fit tests

Announcements and reminders

We have reached our final week of new material in this course. Since classes officially end on Tuesday, April 13 2021, this week’s chapter is optional (not required) for you to complete.

Here are the remaining tasks you must complete, if you have not done so already:

  • Schedule and pass your Oral Exam #3.
  • Submit your final project (into its own D2L dropbox)

I recommend reviewing the course calendar as well as the final project requirements. The absolute deadline for everything—if you are following the standard course schedule—is Sunday April 25 2021.

13.1 Chi-Square

The previous statistics we have learned this semester are all parametric tests—we are using data from a sample to draw conclusions about a population, and the parameters of that population are expected to meet certain assumptions. Non-parametric tests do not require assumptions about the underlying population and do not test hypotheses about population parameters. Categorical data, and data that are not normally distributed, can be analyzed with non-parametric statistics.

With categorical variables, you can’t calculate a mean or standard deviation. Instead, you have frequencies. The data can be measured on the nominal or ordinal scale. You can determine the relationship between categorical variables using a chi-square statistic. There are two common forms of the chi-square test statistic:

  • Chi-square test for independence (the focus of this chapter)
  • Chi-square goodness-of-fit test (not required)

These two tests are addressed individually in the following sections. But first, please skim the following resource about chi-square tests:

Note that chi-square is often written as \(\chi^2\) or \(X^2\). \(\chi\) is the Greek letter chi. Sometimes this is written as the English letter \(X\) instead.

13.1.1 Chi-square Test For Independence

The chi-square test for independence is used when you want to test the relationship between two or more categorical variables.

Here are some key details about the test:

  1. Testing hypotheses about relationship between two variables
  2. Testing hypotheses about differences in between proportions for two or more populations
  3. The null hypothesis is that there is no association between the two variables
  4. Uses a contingency table to analyze the data (see example below)
  5. In general, you need an expected frequency of 5 people per cell

Below is an example of a 2x3 contingency table examining the relationship between diabetes and gender (fake data).

Research question: Is there a relationship between gender and diabetes status?

  • Null hypothesis: there is no relationship between gender and diabetes status.
  • Alternate hypothesis: there is a relationship between gender and diabetes status.
Prediabetes Diabetic No diabetes
Male 15 16 19
Female 11 8 31

13.1.1.1 Chi-square Test in R

The following resource explains how to conduct a chi-square test in R and interpret the results.

13.1.1.2 Effect size estimation for Chi-Square

The following two metrics can help you calculate effect size when using chi-square:

  • Phi – In the case of a 2x2 chi-square (two independent variables, each with 2 levels), you can code each of the independent variables as a 0/1 dummy variable and then calculate the phi coefficient, which is a form of correlation and measures the strength of the relationship between the two variables (effect size).

  • Cramer’s V – With more than two categories, you can use Cramer’s V as a measure of effect size. Cramer’s V is a modified version of phi.

13.1.2 Chi-square Goodness-Of-Fit Test

The Chi-Square Goodness of Fit Test is used when you want to determine whether the data follow a particular distribution. For this test, you have categorical data for one variable.

Here are some key details about the test:

  1. Uses frequency data from a sample to test hypotheses about the shape or proportion of a population
  2. Assumes that there is an equal chance among categories, OR, tests against known proportions from another population.
  3. Tests whether the expected proportions are different from the observed proportions. This means that the null hypothesis is that there is no difference between the observed and expected values. A large discrepancy results in a large value for chi-square and indicates that the data do not fit the null hypothesis and the hypothesis should be rejected.
  4. For a nice example of a goodness-of-fit test using M&Ms, click here.174

13.2 Non-parametric Tests

The parametric statistics we have worked with so far involve ratio or interval data. However, there may be cases where the data you collect is ordinal. Or, the data you collect is interval or ratio, but it does not meet the assumptions of the general linear model (for example: it is highly skewed, not normally distributed) and therefore tests like t-tests, ANOVA, or regression are not appropriate. In these cases, you can rank-order the data and use non-parametric tests to examine hypotheses about differences between groups.

Here are a few important non-parametric tests and how they can be used to answer different types of research questions:

  • Mann-Whitney U Test: Evaluates the difference between two independent groups; analogous to independent samples t-tests. Null hypothesis is that there no difference between the two groups. If there is a significant difference between conditions, the MW-U will be small. Closer to 0 is better—a value of 0 indicates that there is no overlap (in the rank order) between the two groups.

  • Wilcoxon Signed Rank Test: Evaluates differences between two conditions using data from a repeated measures design; analogous to a paired samples t-test. This test involves rank ordering the difference scores. A small T indicates a difference between treatments.175

  • Kruskal-Wallis: Evaluates differences between 3 or more independent groups (analogous to one-way ANOVA). The data are rank ordered (it doesn’t require numeric data, but you need to be able rank everyone). The null hypothesis is that there are no systematic differences across the conditions/groups.

  • Friedman Test: evaluates differences between three or more conditions in a repeated measures design (analogous to single factor repeated measures ANOVA). The null hypothesis is that there are no systematic differences across the conditions/groups

For both the K-W and Friedman tests, the resulting test statistic is an omnibus one—it tells you there is at least one difference between the groups, but not which specific groups are different. You will need to perform follow-up analyses to determine which groups are different.

Pairwise comparisons are made using the Mann-Whitney U or Wilcoxon test. For the stepwise stepdown procedure, the groups are ranked by the sum of their ranks. Then the first and second groups are compared. If they are not different, then the third group is entered, and so on. When a significant difference is found, the procedure is stopped. The last group added (the one that led to a significant difference) moves to the next step, and all previously entered groups are combined and considered a homogenous subset (because they weren’t different). If there are any remaining groups that need to be compared, then the same procedure continues using the group that was significantly different in the previous step, and then adding in additional groups until another significant variable pops.

13.2.0.1 Effect Size For Non-parametric Tests

There will be a z-statistic associated with the Mann-Whitney U and the Wilcoxon Signed Rank Test. These z-statistics can be converted to effect size r, which you’ll remember is a correlation (and a correlation coefficient is also a measure of effect size because it tells us the size of the relationship between variables).

For the Kruskal-Wallis and Friedman tests, use the statistics you get from your post-hoc/follow-up tests and convert those z scores to r:

\[r = \frac{z}{\sqrt{N}}\]

  • \(N=\) total number of observations in the study.

13.3 Assignment

This assignment is optional (not required) for you to complete.

Since this is the final week of classes and a bit shorter than a usual week, this assignment is meant to be shorter and quicker than the other assignments.

In this week’s assignment, you will practice using chi-square tests for independence to answer two questions in a dataset.

This assignment is a good opportunity to use your own data, if you wish. If you do want to use your own data, just replace the variables in the instructions below with variables from your own dataset. This is only possible if your own data has at least two categorical variables in it.

If you want to use data provided by me, you should use the smoking dataset. You can click here to download the dataset. Then run the following code to load the data—an SPSS file—into R:

if (!require(foreign)) install.packages('foreign')
library(foreign)

s <- read.spss("smoking.sav", to.data.frame = TRUE)

Please read the following information about this dataset:

The smoking.sav file contains data from a study that was designed to examine the relationship between health-related quality of life (HRQoL) and smoking in faculty, staff and students in a large department of nursing in a U.S. university. For the purpose of this assignment, we will assume the variables of interest are not normally distributed and therefore we need to use non-parametric tests to answer the research questions.

13.3.1 Chi-Square Tests

There are two research questions that you need to answer this week. For each one, please include a contingency table (two-way table), the code and result you used to test the question, and your interpretation of the result.

Here are the research questions:

  1. Is there a relationship between gender and whether someone is a current smoker? Use the dichotomous Gender and current_smoker variables.

  2. Do people differ in the amount of exercise they do across categories of smoking? For this analysis, use the exercise_days_cat and smoking_status variables.

If you chose to complete this assignment—which is optional—please submit it in D2L as usual.


  1. Taylor, C. 2018. Example of a Chi-Square Goodness of Fit Test. ThoughtCo. https://www.thoughtco.com/chi-square-goodness-of-fit-test-example-3126382.↩︎

  2. Note that there is another Wilcoxon test, the Wilcoxon Rank-Sum Test, which is analogous to the Mann-Whitney Test above. This distinction can be confusing.↩︎