Topic 13 Chi-Square test for independence

So far, we tested the correlation between two continuous variables. Now, we want to test for a correlation between two (or more) categorical variables. For this, we use the Chi-Square Test of Independence.

I will not go over the calculations for this test.

Here, what matters most is to know when to use it and how to interpret the results.

13.1 Assumptions

The main assumption we need to care about is the following:

Independence assumption: There must be no relationship between the variables in each group.

This is not simple to test.
Usually, we assume it holds if each group is composed of observations from different units. In other words, there must be no individual that belongs to both groups.

13.2 Interpretation

Null hypothesis: Variable1 is independent of Variable2. In other words, there is no association between the variables.
Alternative hypothesis: Variable1 is associated with Variable2

Once you calculate your test, you look at the p-value of your Chi-square coefficient to check if it is statistically significant.

As before, if it is statistically significant, then we reject the null hypothesis.

Note that your degrees of freedom depend on the size of the cross tabulation table (as you saw in lecture):

$df=(Rows - 1)(Columns - 1)$

Where rows and columns represent the number of categories in each categorical variable.

13.3 Example

I will demonstrate using the “health_data.sav” dataset.

Is heart attack associated with health status?