Chapter 6 Association

Tests of association assess the strength of association between two variables. There are many variations on this theme.

Pearson’s correlation assesses the strength of a linear relationship between two continuous variables. It applies when the relationship is linear with no outliers and the variables are bi-variate normal. There are two less restrictive alternatives, Spearman’s rho and Kendall’s tau, that assess the strength and direction of association.

A two-way frequency table is a frequency table for two categorical variables. You usually construct a two-way table to test whether the frequency counts in one categorical variable differ from the other categorical variable using the chi-square test for independence or G-test, or Fisher’s exact test. If there is a significant difference (i.e., the variables are related), then describe the relationship with an analysis of the residuals, calculations of measures of association (difference in proportions, Phi and Cramer’s V, Relative risks, or Odds Ratio), and partition tests.

If one of the variables is bivariate categorical, use point-biserial correlation, a special case of Pearson’s correlation. Pearson’s partial correlation controls for one or more variables - linear regression?

If both variables are ordinal, use Goodman and Kruskal’s gamma. Somers’ d is an alternative if you want to distinguish between a dependent and independent variable (instead of linear regression?). The Mantel-Haenszel test of trend is used to determine whether there is a linear trend (i.e., a linear relationship/association) between two ordinal variables that are represented in a contingency table. The Cochran-Armitage test of trend is used to determine whether there is a linear trend (i.e., a linear relationship/association) between an ordinal independent variable and a dichotomous dependent variable.

Goodman and Kruskal’s λ (the Greek symbol, λ, is pronounced “lambda”) is also referred to as Goodman and Kruskal’s lambda. It is a nonparametric measure of the strength of association between two nominal variables where a distinction is made between a dependent and independent variable

Loglinear analysis is used to understand (and model) associations between two or more categorical variables (i.e., nominal or ordinal variables). However, loglinear analysis is usually employed when dealing with three or more categorical variables, as opposed to two variables, where a chi-square test for association is usually conducted instead.

There are three common correlation tests for categorical variables12: Tetrachoric correlation for binary categorical variables; polychoric correlation for ordinal categorical variables; and Cramer’s V for nominal categorical variables.