Chapter 3 Multi-Way Contingency Tables
Multi-way contingency tables are very common in practice, derived by the presence of more than two cross-classification variables.
3.1 Description
3.1.1 Three-way Tables
Consider an \(I \times J \times K\) contingency table \((n_{ijk})\) for \(i = 1,...,I\), \(j = 1,...,J\) and \(k = 1,...,K\), with classification variables \(X\) (the rows), \(Y\) (the columns) and \(Z\) (the layers) respectively.
A schematic of a generic \(X \times Y \times Z\) contingency table of counts is shown in Figure 3.1.
We can define the joint probability distribution of \((X,Y,Z)\) as \[\begin{equation} \pi_{ijk} = P(X=i, Y=j, Z=k) \end{equation}\]
Proportions, observed and random counts are defined similarly to the \(I \times J\) contingency table cases….
3.1.1.1 Example
The table in Figure 3.2 shows an example of a 3-way contingency table. This hypothetical data cross-classifies the response (\(Y\)) to a treatment drug (\(X\)) at one of two different clinics (\(Z\)).
3.1.1.2 Partial/Conditional Tables
Partial, or conditional, tables involve fixing the category of one of the variables.
We denote the fixed variable in parentheses.
For example, the set of \(XY\)-partial tables consist of the \(K\) corresponding two-way layers, denoted as \((n_{ij(k)})\) for \(k = 1,...,K\).
\(XZ\) and \(YZ\)- partial tables are denoted as \((n_{i(j)k})\) and \((n_{(i)jk})\) respectively.
Partial/conditional probabilities: \[\begin{equation} \pi_{ij(k)} = \pi_{ij|k} = P(X=i, Y=j | Z=k) = \frac{\pi_{ijk}}{\pi_{++k}} \qquad k = 1,...,K \end{equation}\]
Partial/conditional proportions: \[\begin{equation} p_{ij(k)} = p_{ij|k} = \frac{n_{ijk}}{n_{++k}} \qquad k = 1,...,K \end{equation}\]
3.1.1.3 Marginal Tables
Marginal tables involve summing over all possible categories of a particular variable.
We denote such summation using a \(+\) (as before).
For example, the \(XY\)- marginal table is \((n_{ij+}) = (\sum_k n_{ijk})\).
\(XZ\) and \(YZ\)- marginal tables are denoted as \((n_{i+k})\) and \((n_{+jk})\) respectively.
Marginal probabilities: \[\begin{equation} \pi_{ij} = P(X=i, Y=j) = \pi_{ij+} = \sum_{k=1}^K \pi_{ijk} \end{equation}\]
Marginal proportions: \[\begin{equation} p_{ij} = p_{ij+} = \sum_{k=1}^K p_{ijk} \end{equation}\]
3.1.2 Generic Multiway Tables
A multiway \(I_1 \times I_2 \times ... \times I_q\) contingency table for variables \(X_1,X_2,...,X_q\) will analogously be denoted as \((n_{i_1i_2...i_q})\), \(i_l = 1,...,I_l\), \(l = 1,...,q\).
The definition of partial and marginal tables also follow analogously.
For example, \((n_{i_1+(i_3)i_4(i_5)})\) denotes the two-way partial marginal table obtained by summing over all levels/categories \(i_2\) of \(X_2\) for a fixed level/category of (or conditioning on) variables \(X_3=i_3\) and \(X_5=i_5\).
3.2 Odds Ratios
Conditional and marginal odds ratios can be defined for any two-way conditional or marginal probabilities table of a multi-way \(I_1 \times I_2 \times ... \times I_q\) table with \(I_l \geq 2\), \(l = 1,...,q\).
In this case, the conditional and marginal odds ratios are defined as odds ratios for two-way tables of size \(I \times J\).
Thus, as defined for general two-way tables in Sections 2.5 and 2.6.2, there will be a (not unique) minimal set of odds ratios of nominal, local, cumulative, or global type.
For example, for an \(I \times J \times K\) table, the \(XY\) local odds ratios conditional on \(Z\) are defined by \[\begin{equation} r_{ij(k)}^{L, XY} = \frac{\pi_{ijk}\pi_{i+1,j+1,k}}{\pi_{i+1,j,k}\pi_{i,j+1,k}} \qquad i = 1,...,I-1 \quad j = 1,...,J-1 \quad k = 1,...,K \end{equation}\] and the \(XY\)-marginal local odds ratios are defined by \[\begin{equation} r_{ij}^{L, XY} = \frac{\pi_{ij+}\pi_{i+1,j+1,+}}{\pi_{i+1,j,+}\pi_{i,j+1,+}} \qquad i = 1,...,I-1 \quad j = 1,...,J-1 \end{equation}\]
The conditional and marginal odds ratios of other types, like nominal, cumulative and global, are defined analogously.
3.3 Types of Independence
Let \((n_{ijk})\) be an \(I \times J \times K\) contingency table of observed frequencies with row, column and layer classification variables \(X\), \(Y\) and \(Z\) respectively.
We consider various types of independence that could exist among these three variables.
3.3.1 Mutual Independence
\(X\), \(Y\) and \(Z\) are mutually independent if and only if \[\begin{equation} \pi_{ijk} = \pi_{i++} \pi_{+j+} \pi_{++k} \qquad i = 1,...,I \quad j = 1,...,J \quad k = 1,...,K \tag{3.1} \end{equation}\]
Such mutual independence can be symbolised as \([X,Y,Z]\).
3.3.1.1 Example
Following the example of Section 3.1.1.1, mutual independence would mean that clinic, drug and response were independent of each other.
In other words, knowledge of the values of one variable doesn’t affect the probabilities of the levels of the others.
3.3.2 Joint Independence
If \(Y\) is jointly independent from \(X\) and \(Z\) (without these two being necessarily independent), then \[\begin{equation} \pi_{ijk} = \pi_{+j+} \pi_{i+k} \qquad i = 1,...,I \quad j = 1,...,J \quad k = 1,...,K \tag{3.2} \end{equation}\]
Such joint independence can be symbolised as \([Y,XZ]\).
By symmetry, there are two more hypotheses of this type, which can be expressed in a symmetric way to Equation (3.2) for \(X\) or \(Z\) being jointly independent from the remaining two variables. These could be symbolised as \([X, YZ]\) and \([Z, XY]\) respectively.
3.3.3 Marginal Independence
\(X\) and \(Y\) are marginally independent (ignoring \(Z\)) if and only if \[\begin{equation} \pi_{ij+} = \pi_{i++} \pi_{+j+} \qquad i = 1,...,I \quad j = 1,...,J \quad k = 1,...,K \tag{3.3} \end{equation}\]
Here, we actually ignore \(Z\).
Such marginal independence is symbolised \([X,Y]\).
3.3.3.1 Example
- If \(Y\) and \(Z\) are marginally independent38 (that is \([Y,Z]\)), then this would imply that response to treatment is not associated with the clinic attended if we ignore which drug was received.
3.3.4 Conditional Independence
Under a multinomial sampling scheme, the joint probabilities of the three-way table cells \(\pi_{ijk}\) can be expressed in terms of conditional probabilities as \[\begin{eqnarray} \pi_{ijk} & = & P(X=i, Y=j, Z=k) \\ & = & P(Y=j| X=i, Z=k) \, P(X=i, Z=k) \\ & = & \pi_{j|ik} \pi_{i+k} \end{eqnarray}\]
\(X\) and \(Y\) are conditionally independent given \(Z\) if \[\begin{equation} \pi_{ij|k} = \pi_{i|k} \pi_{j|k} \qquad k = 1,...,K \end{equation}\]
We can consequently show that \[\begin{equation} \pi_{j|ik} = \pi_{j|k} \end{equation}\] and therefore that \[\begin{eqnarray} \pi_{ijk} = \pi_{j|k} \pi_{i+k} & = & P(Y=j|Z=k) P(X=i,Z=k) \nonumber \\ & = & P(X=i,Z=k) \frac{P(Y=j,Z=k)}{P(Z=k)} \nonumber \\ & = & \frac{\pi_{i+k}\pi_{+jk}}{\pi_{++k}} \tag{3.4} \\ && \qquad \qquad i = 1,...,I \quad j = 1,...,J \quad k = 1,...,K \nonumber \end{eqnarray}\]
Note that we here assumed that \(Y\) was the response variable. The conditioning approach with \(X=i\) as response variable would also lead to Equation (3.4), which is symmetric in terms of \(X\) and \(Y\).
This conditional independence of \(X\) and \(Y\) given \(Z\) can be symbolised as \([XZ,YZ]\).
The hypotheses of conditional independence \([XY, YZ]\) and \([XY,XZ]\) are formed analogously to Equation (3.4).
3.3.4.1 Example
If \(Y\) and \(Z\) are conditionally independent given \(X\) (that is, \([XY,XZ]\)), this implies that response to treatment is independent of clinic attended given knowledge of which drug was received.
3.3.4.2 Odds Ratios
Under conditional independence of \(X\) and \(Y\) given \(Z\) ([XZ,YZ]), the \(XZ\) odds ratios conditional on \(Y\) are equal to the \(XZ\) marginal odds ratios, that is39 \[\begin{equation} r_{i(j)k}^{XZ} = r_{ik}^{XZ} \qquad i = 1,...,I-1 \quad j = 1,...,J \quad k = 1,...,K-1 \tag{3.5} \end{equation}\] In other words, the marginal and conditional \(XZ\) associations coincide.
By symmetry, we also have that \[\begin{equation} r_{(i)jk}^{YZ} = r_{jk}^{YZ} \qquad i = 1,...,I \quad j = 1,...,J-1 \quad k = 1,...,K-1 \end{equation}\] that is, the marginal and conditional \(YZ\) associations coincide.
However, the \(XY\) marginal and conditional associations do not coincide, that is: \[\begin{equation} r_{ij(k)}^{XY} \neq r_{ij}^{XY} \end{equation}\] in general.
Such arguments for \([XY,YZ]\) and \([XY,XZ]\) are analogous.
3.3.5 Conditional and Marginal Independence
Important: Conditional independence does not imply marginal independence, and marginal independence does not imply conditional independence.
3.3.5.1 Example
3.3.5.1.1 Marginal but not Conditional Independence
Suppose response \(Y\) and clinic \(Z\) are marginally independent (ignoring treatment drug \(X\)). However, there may be a conditional association between response to treatment \(Y\) and clinic attended \(Z\) on the drug received \(X\).
Example potential explanation40: some clinics may be better prepared to care for subjects on some treatment drugs than others, but without knowledge of the treatment drug received, neither clinic is more associated with a successful response.
3.3.5.1.2 Conditional but not Marginal Independence
Suppose \(Y\) and \(Z\) are conditionally independent given \(X\) (that is, \([XY,XZ]\)), then this implies that response to treatment is independent of clinic attended given knowledge of which drug was received. However, there may be a marginal association between response to treatment \(Y\) and clinic attended \(Z\) if we ignore which treatment drug \(X\) was received.
Example potential explanation: Given knowledge of the treatment drug, it does not matter which clinic the subject attends. However, without knowledge of the treatment drug, one clinic may be more associated with a successful response (perhaps because their stock of the more successful drug is greater…).
3.3.6 Homogeneous Associations
Homogeneous associations (also known as no three-factor interactions) mean that the conditional relationship between any pair of variables given the third one is the same at each level of the third variable; but not necessarily independent.
This relation implies that if we know all two-way tables between the three variables, we have sufficient information to compute \((\pi_{ijk})\).
However, there are no separable closed-form estimates for the expected joint probabilities \((\hat{\pi}_{ijk})\), hence maximum likelihood estimates must be computed by an iterative procedure such as Iterative Proportional Fitting or Newton-Raphson.
Such homogeneous associations are symbolised \([XY, XZ, YZ]\).
3.3.6.1 Odds Ratios
Homogeneous associations can be thought of in terms of conditional odds ratios as follows:
the \(XY\) partial odds ratios at each level of \(Z\) are identical: \(r_{ij(k)}^{XY} = r_{ij}^{XY, \star}\)
the \(XZ\) partial odds ratios at each level of \(Y\) are identical: \(r_{i(j)k}^{XZ} = r_{ik}^{XZ, \star}\)
the \(YZ\) partial odds ratios at each level of \(X\) are identical: \(r_{(i)jk}^{YZ} = r_{jk}^{YZ, \star}\)
Note that \(r_{ij}^{XY, \star}, r_{ik}^{XZ, \star}, r_{jk}^{YZ, \star}\) are not necessarily the same as the corresponding marginal odds ratios \(r_{ij}^{XY}, r_{ik}^{XZ}, r_{jk}^{YZ}\).
3.3.6.2 Example
The treatment response and treatment drug have the same association for each clinic.
More precisely, we have \[\begin{equation} r_{A,S,(k)}^{XY} = r_{A,S}^{XY, \star} \iff \frac{\pi_{A,S,(k)}}{ \pi_{A,F,(k)}} = r_{A,S}^{XY, \star} \frac{\pi_{B,S,(k)}}{\pi_{B,F,(k)}} \qquad k = 1,2 \end{equation}\] which means that each drug has a different odds of success depending on the clinic, however, the odds of treatment success of drug \(A\) are a fixed constant \(r_{A,S}^{XY, \star}\) greater than the odds of treatment success of drug \(B\), regardless of the clinic.
3.3.7 Tests for Independence
Marginal independence (Equation (3.3)) can be tested using the test for independence presented in Section 2.4.3.1 applied on the corresponding two-way marginal table.
Hypotheses of the independence statements defined by Equations (3.1), (3.2) and (3.4) could be tested analogously using the relevant marginal counts.
We do not consider these tests, but defer to log-linear models (soon!).
A specific test of independence of \(XY\) at each level of \(Z\) for \(2 \times 2 \times K\) tables is presented in Section 3.3.10.
3.3.8 Summary of Relationships
We present a summary of which independence relationships can be implied from which others, and which can’t, in Figure 3.3.
3.3.9 Multi-way Tables
- Analogous definitions of the various types of independence exist for general multi-way tables.
3.3.10 Mantel-Haenszel Test for \(2 \times 2 \times K\) Tables
We will discuss the particular case of \(X\) and \(Y\) being binary variables that are cross-classified across the \(K\) layers of a variable \(Z\), forming \(K\) \(2 \times 2\) partial tables \(n_{ij(k)}, \, k = 1,...,K\).
The Mantel-Haenszel Test is for testing the conditional independence of \(X\) and \(Y\) given \(Z\) for these \(2 \times 2 \times K\) tables, that is, it considers the hypotheses \[\begin{eqnarray} \mathcal{H}_0: & \, X,Y \textrm{are independent conditional on the level of } Z. \\ \mathcal{H}_1: & \, X,Y \textrm{are not independent conditional on the level of } Z. \end{eqnarray}\] or in other words41 \[\begin{eqnarray} \mathcal{H}_0: & \, r_{12(k)} = 1, \, \textrm{for all} \, k = 1,...,K \\ \mathcal{H}_1: & \, r_{12(k)} \neq 1, \, \textrm{for some} \, k = 1,...,K \\ \end{eqnarray}\]
The Mantel-Haenszel Test conditions on the row and column marginals of each of the \(K\) partial tables.
Under \(\mathcal{H}_0\), every partial table has that \(n_{11k}\) follows a hypergeometric distribution42 \(\mathcal{H} g(N = n_{++k}, M = n_{1,+,k}, q = n_{+,1,k})\)43, and thus has mean and variance \[\begin{equation} \hat{E}_{11k} = \frac{n_{1+k} n_{+1k}}{n_{++k}} \qquad \qquad \hat{\sigma}^2_{11k} = \frac{n_{1+k} n_{2+k} n_{+1k} n_{+2k}}{n^2_{++k}(n_{++k} - 1)} \nonumber \end{equation}\]
\(\sum_k n_{11k}\) therefore has mean \(\sum_k \hat{E}_{11k}\) and variance \(\sum_k \hat{\sigma}^2_{11k}\), since the values of \(n_{11k}\) are independent of each other (having conditioned on \(Z=k\)).
The Mantel–Haenszel test statistic is defined as44 \[\begin{equation} T_{MH} = \frac{[\sum_k (n_{11k} - \hat{E}_{11k})]^2}{\sum_k \hat{\sigma}_{11k}^2} \tag{3.6} \end{equation}\]
\(T_{MH}\) is asymptotically \(\chi^2_1\) under \(\mathcal{H}_0\).
If \(T_{MH(obs)}\) is the observed value of the test statistic for a particular case, then the \(p\)-value is \(P(\chi_1^2 > T_{MH(obs)})\).
When the \(XY\) association is similar across the partial tables, then the test is more powerful.
It loses in power when the underlying associations vary across the layers, especially when they are of different direction, since the differences \(n_{11k} - \hat{E}_{11k}\) will then cancel out in the sum of the statistic given by Equation (3.6).
ignoring \(X\)↩︎
Note that we don’t superscript \(L\) or \(G\) here, as the result holds for both. Q3-2 involves showing that Equation (3.5) holds for local odds ratios.↩︎
Note that this is precisely what this is - a potential explanation - it would be incorrect to conclude that this is definitely the reason for the hypothesised independence scenarios. We all know (I hope…) that (or in this case, ) .↩︎
Note that we revert back to the \(r_{12}\) notation here since each of the \(K\) layers is a \(2 \times 2\) table.↩︎
Why hypergeometric? Well, for any \(2 \times 2\) table we have an assumed total of \(N = n_{++k}\) items. We condition on row and column margins, so we assume knowledge of \(n_{i,+,k}\) and \(n_{+,j,k}\). In that population, we know that \(M= n_{1,+,k}\) of these items are such that \(i=1\). If the two variables \(X\) and \(Y\) are conditionally independent given \(Z\), then we could view \(N_{1,1,k}\) to be the result of picking \(q = n_{+,1,k}\) items (those going into column 1) randomly from \(N = n_{++k}\), and calculating how many of those are from row 1 (given that we know that there are \(M=n_{1,+,k}\) items out of the \(N\) that will go into row 1 in total). Therefore \(N_{1,1,k} \sim \mathcal{H} g(N = n_{++k}, M = n_{1,+,k}, q = n_{+,1,k})\)↩︎
Note that the square is outside of the summation.↩︎