Chapter 4 Multiple Membership models: the details

The key feature of a multiple membership model is that level 1 units do not belong to one and only one level 2 cluster. There may be some units that exhibit true hierarchy, but there are others that are ‘multiple members’: they belong to 2 or more clusters. A classic example of this is mobile students in educational settings (eg. Chung and Beretvas (2012), Heck, Reid, and Leckie (2022), Leroux (2019)). Another example in the health care setting is patient outcomes, but each patient frequently sees more than one nurse (Leckie 2013b). Using a multiple membership model allows the influence of multiple level 2 clusters to be taken into account. Additionally, just like ignoring hierarchical data violates the independent observations assumption and cause an inflated type I error rate, ignoring a multiple membership structure of your data will also cause an inflated type I error rate.

Historically, when researchers encountered a multiple membership data structure (ie, mobile students) and were unable to model it correctly, there were a few options to handle this data that they would choose from, all of which resulted in mis-specified models. In one option, researchers would only consider the unit to be a member of their most recent cluster. As an example, only consider the school a child was in at the time of testing, rather than any other previous schools. Another option was to simply delete the mobile students from the data set, leaving only a pure hierarchy of students nested in schools. A third option was to acknowledge the mobility of students by including a dichotomous or ordinal “mobility” predictor to include in a regular hierarchical model. This would control for the influence of mobility on the outcome, but did not allow for the modeling of the effects of different schools on the student. However, none of these methods are an appropriate treatment of multiple membership data. The results obtained from the ‘delete mobile students’ approach, for example, can only be generalized to other non-mobile students as well as decreasing the power. In the study by Chung and Beretvas (2012), they reported negative bias in the level 2 predictor coefficients and in the level 2 variance components when employing the ‘only count the last school’ technique. In the same study, they also found that the level 1 variance components were positively biased. It is for these reasons that researchers investigating multiple membership data use a multiple membership model, in order to allow for generalization of results, proper capturing of mobile student effects, and validity of inferences about school effects (Smith and Beretvas 2017).

4.1 Notation

There are three possibilities for how to depict these models, each with its own set of advantages and disadvantages. Standard hierarchical notation, which we are familiar with from truly hierarchical models, uses subscripts to identify individual and cluster level predictors and effects. The disadvantage of this notation is that, due to its familiarity, individuals may incorrectly assume that unit \(\textit{i}\) is strictly nested within cluster \(\textit{j}\), which is not the case (Snijders and Bosker (2012); Leckie (2013b)). Another method is to use multiple subscript notation (Beretvas (2011)). This notation makes the non-hierarchical structure more apparent, however, it can quickly become challenging to read.

The notation we will use moving forward is classification notation (Leckie 2013b):
\[y_{i} = \beta_{0} + \sum_{j \in school(i)}w_{j,i}^{(2)}u_{j}^{(2)} + e_{i}\] \[u_{j}^{(2)} \sim N(0, \sigma_{u(2)}^{2})\] \[e_{i} \sim N(0, \sigma_{e}^{2})\] With this notation, the multiple membership component is represented by \(\sum_{j \in school(i)}w_{j,i}^{(2)}u_{j}^{(2)}\), which represents the weighted (\(w_{j,i}^{(2)}\)) cluster effects (\(u_{j}^{(2)}\)). The weight component represents the amount of time unit \(\textit{i}\) spent in cluster \(\textit{j}\), with the superscript (2) representing this is associated with the second level. All levels in the model are indicated, with level 1 (1) being implicit. The superscripts/subscripts for each level will identify to which level the random effects, variance parameters, and covariance parameters are associated with.

4.2 Weights

Weights are assigned to the level two clusters to reflect the different influences each one has on the level 1 units, and sum to one for any given level 1 unit. These are decided by the researcher, who may take one of several approaches. One approach is to assign equal weights, regardless of time. This is the approach taken in the example data set for this tutorial: it was assumed that each teacher seen by a student had equal influence, and the data set was designed in such a way that students did not see any given teacher more than once. In a mobile student situation, it may be that if a student attended a school for any length of time, that school would be given an equal weight to all the other schools attended by the student, even if they attended other schools for longer. Another approach is to account for length of time. In the previous example, if a student spent 1 year at school 1 but 2 years at school two, a researcher may weight school one as 0.33 and school 2 as 0.67. This will take into account the fact that since the student spent more time at school 2, it likely had a larger impact. A third approach seen is weighting later educational settings more heavily than earlier ones, to reflect the fact that the more recent education may have a larger impact on test scores (for example) than earlier education (Chung and Beretvas 2012).

If we look at the data (), we can pull two students to see the earlier equation written out for each of them, student 1 and student 3. For student 1, we see that they had three teachers (24, 32, and 46) while student 3 only had one teacher (12). The equations for students 3 and 1 are shown below, indicating the role cluster weights play in these models.

4.2.1 Student 3

\[y_{3} = \beta_{0} + \sum_{j \in \{12\}}w_{j,3}^{(2)}u_{j}^{(2)} + e_{3}\]
\[ = \beta_{0} + w_{12,3}^{(2)}u_{12}^{(2)} + e_{3}\]
\[ = \beta_{0} + u_{12}^{(2)} + e_{3}\]

4.2.2 Student 1

\[y_{1} = \beta_{0} + \sum_{j \in \{24, 32, 46\}}w_{j,1}^{(2)}u_{j}^{(2)} + e_{1}\]
\[ = \beta_{0} + w_{24,1}^{(2)}u_{24}^{(2)} + w_{32,1}^{(2)}u_{32}^{(2)} + w_{46,1}^{(2)}u_{46}^{(2)}+ e_{1}\]
\[ = \beta_{0} + 0.33u_{24}^{(2)} + 0.33u_{32}^{(2)} + 0.33u_{46}^{(2)}+ e_{1}\]

These weights are used both to identify amount of belonging to a particular level 2 cluster, but also to appropriately weight any level 2 predictors. For example, if we were to add in t_exp as teacher-level predictor, the resulting equations would look like this:

\[y_{i} = \beta_{0} + \beta_{1}\sum_{j \in school(i)}w_{j,i}^{(2)}t\_exp_{2j}^{(2)} + \sum_{j \in school(i)}w_{j,i}^{(2)}u_{j}^{(2)} + e_{i}\] \[u_{j}^{(2)} \sim N(0, \sigma_{u(2)}^{2})\] \[e_{i} \sim N(0, \sigma_{e}^{2})\]

The term \(\sum_{j \in school(i)}w_{j,i}^{(2)}t\_exp_{2j}^{(2)}\) is teacher experience (t_exp), taking into account that since students saw different teachers, the experience of those teachers will be different. This is called the weighted sum of this predictor, with \(\beta_{1}\) being the slope coefficient. We are able to keep the term as-is in our equation, or to simplify the equation, we could calculate it in our data set to be \(\bar x_{2i}\). The resulting equation would then be

\[y_{i} = \beta_{0} + \beta_{1}\overline{(t\_exp)}_{2i} + \sum_{j \in school(i)}w_{j,i}^{(2)}u_{j}^{(2)} + e_{i}\]

4.3 Centering

In pure hierarchical models, predictors (typically at level 1) can either be group mean centered (CWC) or grand mean centered (CGM), with CWC being the generally preferred option due to the resulting interpretability of the coefficients (Enders 2013). However, in a multiple membership model, level 1 units belong to one or more level 2 clusters, meaning group mean centering can get really complicated really fast and indeed may not be possible. If centering is mentioned in the literature surrounding multiple membership models, it is only grand mean centering that is referenced. This may be due to ease, or due to lack of exploration using group mean centering. It may also be due to requiring a large sample size to have enough units in each unique ‘cluster’ combination. For example, students who just had teacher 1 would be a cluster, while students who had teacher 1 and teacher 2 would be another cluster, etc. In our example with 56 teachers, and a max draw of 12 teachers, there are slightly more combinations than our sample size by several orders of magnitude. Much of the literature using multiple membership models (eg. Gero et al. (2020)) did not mention any type of centering; an example where it was mentioned is Smith and Beretvas (2017), where grand mean centering is specifically addressed in their model descriptions.

4.4 VPC and ICC

4.4.1 ICC

Again comparing to pure hierarchical models, one of the first things that is calculated after running an intercept-only model is the ICC, or intraclass correlation. When calculated as \(\frac{\tau_{00}}{\tau_{00} + \sigma^{2}}\), the ICC represents the amount of variance in the outcome variable that can be attributed to the fact that level 1 units are nested in level 2 clusters (Snijders and Bosker 2012). As with most things thus far, calculating an ICC for a multiple membership model is not so straightforward. The equation for a multiple membership model is more complex, and takes into account the cluster ‘profile’ of two units. If we are looking at the cluster level ICC for two different units, \(\textit{i}\) and \(\textit{i'}\), we would calculate it as a correlation like so: \[corr(y_{i}, y_{i'}) = \frac{\sigma_{u(2)}^{2}\sum_{j \in cluster(i) \cup cluster(i')}w_{j,i}^{(2)}w_{j, i'}^{(2)}}{\sqrt{\sigma_{u(2)}^{2}\sum_{j \in cluster(i)}w_{j,i}^{(2)} + \sigma_{e}^2}\sqrt{\sigma_{u(2)}^{2}\sum_{j \in cluster(i')}w_{j,i'}^{(2)} + \sigma_{e}^2}}\]

Interpreting the ICC as the correlation between the outcomes of two students who had the same teacher, and only that teacher, it would be calculated as \[ICC = \frac{\sigma_{u(2)}^{2}}{\sigma_{u(2)}^{2} + \sigma_{e}^{2}}\] However, if these students had two teachers each, and only shared one of them, the calculation for the ICC becomes \[ICC = \frac{0.25\sigma_{u(2)}^{2}}{0.5\sigma_{u(2)}^{2} + \sigma_{e}^{2}}\] This is because the weights for each teacher are incorporated to reflect the amount of time two students had similar teachers and the amount of time they had different teachers (Leckie 2013b).

4.4.2 VPC

Another way to determine how variance is partitioned in a model is the variance partition coefficient. This reports the proportion of observed variation in the outcome that is at each level. The VPC is different from the ICC in that it is not the model implied correlation within a cluster, but the observed variation. As with the ICC, the multiple membership model is more complex in this calculation as well. The variation in outcomes that can be attributed to clusters depends on how much the level 1 units are spread across clusters: are they members of one cluster or multiple clusters? The cluster-level VPC equation representing the amount of variation in outcomes that is attributed to clusters in a multiple membership model is (Leckie 2013b) \[VPC_{u(2)} = \frac{\sigma_{u(2)}^{2}\sum_{j \in cluster(i)}(w_{j,i}^{(2)})^2}{\sigma_{u(2)}^{2}\sum_{j \in cluster(i)}(w_{j,i}^{(2)})^2 + \sigma_{e}^{2}}\]

If we consider the level 1 units that were members of only one cluster, the VPC is equivalent to the ICC for two individuals that had the same, single, teacher above:
\[VPC_{u(2)} = \frac{\sigma_{u(2)}^{2}}{\sigma_{u(2)}^{2} + \sigma_{e}^{2}}\] As above, however, it gets more complex when level 1 units are members of more than one cluster, and it no longer equals the ICC. Following a similar pattern to above, if we consider the population of level 1 units that were members of two different clusters (which one is not important here; we are looking only at cluster-level variance), the VPC calculation becomes \[VPC_{u(2)} = \frac{0.5\sigma_{u(2)}^{2}}{0.5\sigma_{u(2)}^{2} + \sigma_{e}^{2}}\] What this illustrates is that as level 1 units are members of more clusters, the amount of variance in outcomes that is attributed to clusters decreases. Additionally, with a multiple membership model, this is perhaps the easier of the two to both compute and calculate, as we are not considering how the cluster pattern of two individuals compares but rather how many clusters an individual is a member of.

References

Beretvas, S. Natasha. 2011. “Cross-Classified and Multiple-Membership Models.” In Handbook for Advanced Multilevel Analysis, 313–34. Routledge Taylor; Francis Group.

Chung, Hyewon, and S. Natasha Beretvas. 2012. “The Impact of Ignoring Multiple Membership Data Structures in Multilevel Models.” British Journal of Mathematical and Statistical Psychology 65: 185–200.

Enders, Craig K. 2013. “Centering Predictors and Contextual Effects.” In The SAGE Handbook of Multilevel Modeling, 89–108. SAGE Publications Inc. https://doi.org/10.4135/9781446247600.n6.

Gero, Krisztina, Hiroyuki Hikichi, Jun Aida, Katsunori Kondo, and Ichiro Kawachi. 2020. “Associations Between Community Social Capital and Preservation of Functional Capacity in the Aftermath of a Major Disaster.” American Journal of Epidemiology 189: 1369–78.

Heck, Ronald H., Tingting Reid, and George Leckie. 2022. “Incorporating Student Mobility in Studying Academic Growth in Math: Comparing Several Alternative Multilevel Formulations.” School Effectiveness and School Improvement 33: 516–43.

———. 2013b. Multiple Membership Multilevel Models - Concepts. LEMMA VLE Module 13, p. 1-57.

Leroux, Audrey J. 2019. “Student Mobility in Multilevel Growth Modeling: A Multiple Membership Piecewise Growth Model.” The Journal of Experimental Education 87: 430–48.

Smith, Lindsey J. Wolff, and S. Natasha Beretvas. 2017. “A Comparison of Techniques for Handling and Assessing the Influence of Mobility on Student Achievement.” The Journal of Experimental Education 85: 3–23.

Snijders, T. A. B., and R. J. Bosker. 2012. “The Random Intercept Model.” In Multilevel ANalysis: An Introduction to Basic and Advanced Multilevel Modeling, 41–73. SAGE Publications Inc.