Joss Ives: Originally produced April 20, 2020. Updated 2020 May 01, 16:50.
We look at some toy models to help with interpreting odds ratios and how they are calculated in this context.
We make a course with 3 four-student groups, 6 females and 6 males. Table 1.1 shows the resulting odds ratios (F-F to F-M and M-M to F-M). What we see is that the presence of groups of 4 same-binary-gender students provides a very large increase in the odds ratio. As expected, the smallest Odds Ratios are 0.5 and happen when the students distribute themselves as to minimize the number of same-binary-gender pairs within groups, which is 3 groups each having 2 females and 2 males.
Dist. | FF Same Group | MM Same Group | FM Same Group | FF Dif Group | MM Dif Group | FM Dif Group | Odds(FF) | Odds(MM) | Odds(FM) | OR(FF) | OR(MM) |
---|---|---|---|---|---|---|---|---|---|---|---|
FFFF FFMM MMMM | 7 | 7 | 4 | 8 | 8 | 32 | 0.875 | 0.875 | 0.125 | 7.000 | 7.000 |
FFFF FMMM FMMM | 6 | 6 | 6 | 9 | 9 | 30 | 0.667 | 0.667 | 0.200 | 3.335 | 3.335 |
FFFM FFFM MMMM | 6 | 6 | 6 | 9 | 9 | 30 | 0.667 | 0.667 | 0.200 | 3.335 | 3.335 |
FFFM FFMM FMMM | 4 | 4 | 10 | 11 | 11 | 26 | 0.364 | 0.364 | 0.385 | 0.945 | 0.945 |
FFMM FFMM FFMM | 3 | 3 | 12 | 12 | 12 | 24 | 0.250 | 0.250 | 0.500 | 0.500 | 0.500 |
We make a course with 3 four-student groups, 7 females and 5 males. Table ?? shows the resulting Odds Ratios.
Dist. | FF Same Group | MM Same Group | FM Same Group | FF Dif Group | MM Dif Group | FM Dif Group | Odds(FF) | Odds(MM) | Odds(FM) | OR(FF) | OR(MM) |
---|---|---|---|---|---|---|---|---|---|---|---|
FFFF FFFM MMMM | 9 | 6 | 3 | 12 | 4 | 32 | 0.750 | 1.500 | 0.094 | 7.979 | 15.957 |
FFFF FFMM FMMM | 7 | 4 | 7 | 14 | 6 | 28 | 0.500 | 0.667 | 0.250 | 2.000 | 2.668 |
FFFM FFFM FMMM | 6 | 3 | 9 | 15 | 7 | 26 | 0.400 | 0.429 | 0.346 | 1.156 | 1.240 |
FFFM FFMM FFMM | 5 | 2 | 11 | 16 | 8 | 24 | 0.312 | 0.250 | 0.458 | 0.681 | 0.546 |
We make a course with 2 four-student group and 1 three-student group, and with 7 females and 5 males. Table ?? shows the resulting Odds Ratios.
Dist. | FF Same Group | MM Same Group | FM Same Group | FF Dif Group | MM Dif Group | FM Dif Group | Odds(FF) | Odds(MM) | Odds(FM) | OR(FF) | OR(MM) | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2 | FFFF MMMM FFM | 7 | 6 | 2 | 8 | 4 | 28 | 0.875 | 1.500 | 0.071 | 12.324 | 21.127 |
5 | FFFM MMMM FFF | 6 | 6 | 3 | 9 | 4 | 27 | 0.667 | 1.500 | 0.111 | 6.009 | 13.514 |
1 | FFFF FFMM MMM | 7 | 4 | 4 | 8 | 6 | 26 | 0.875 | 0.667 | 0.154 | 5.682 | 4.331 |
3 | FFFF FMMM FMM | 6 | 4 | 5 | 9 | 6 | 25 | 0.667 | 0.667 | 0.200 | 3.335 | 3.335 |
4 | FFFM FFFM MMM | 6 | 3 | 6 | 9 | 7 | 24 | 0.667 | 0.429 | 0.250 | 2.668 | 1.716 |
7 | FFFM FMMM FFM | 4 | 3 | 8 | 11 | 7 | 22 | 0.364 | 0.429 | 0.364 | 1.000 | 1.179 |
6 | FFFM FFMM FMM | 4 | 2 | 9 | 11 | 8 | 21 | 0.364 | 0.250 | 0.429 | 0.848 | 0.583 |
8 | FFMM FFMM FFM | 3 | 2 | 10 | 12 | 8 | 20 | 0.250 | 0.250 | 0.500 | 0.500 | 0.500 |
Freeman 2017 outlines a 700 student study where they looked at how students self-selected themselves into groups of 3 throughout the term by various demographic and performacne measures.
Freeman, S., Theobald, R., Crowe, A. J., & Wenderoth, M. P. (2017). Likes attract: Students self-sort in a classroom by gender, demography, and academic characteristics. Active Learning in Higher Education, 18(2), 115-126. https://doi.org/10.1177/1469787417707614
Their approach is to use a logistic regression equation where the unit of analysis is the student pair, where in their 700 students they cycle through every possible pair of students. The baseline for each category/variable is groups that do not share the same characteristic, such as the probability that two females will work together (“students share a covariate”) will be compared to the probability that a male and a female will work together.
Of note, the demographics range from somewhat ballanced (60.5% female vs 39.5% male) to Table I from Freeman 2017 – Paying attention to the
Figure 2.1 shows the results of their odds ratios. Of note, the largest Odds Ratios (Both African American and Both International) are associated with groups that have very small representation in the coures (3.3% and 6.0%, respectively).
This analysis method strikes me as a useful way to quantify how much students are grouping by similar demographic and performance measures. However, their approach raises the following concerns and questions
Exploratory data analyses were performed using final exam data from Physics 101, 2014-W2 (a.k.a. Jan, 2015). These data were used in a variet of ways to better understand concerns from the Freeman analysis and to establish how analyses were to be performed across the many data sets that I have available.
The data set has the following features:
The student covariates are
The Freeman data set and the Physics 101 2014-W2 data sets are very similar in terms of their sizes (~700) and proportions of females (~60%) and males (~40%) in the courses. Thus, this demographic variable is potentially quite valuable in trying to make sense of the pairing preferences that arise from the logistic regression and of the possible sensitivity that this analysis might have to extreme proportions by using simulated data to explore the impact on a (forced) binary category.
Freeman’s data set has measurements at 5 data points. From Figure 4.1, we see that the odds ratio for the final two data points (Days 18 and 38) are on the order of 1.5 for both females and males. They are slightly higher for the Physics 101 data set. For the Freeman data set, these Odds Ratios represent results when controlling for a number of other covariates. For the Physics 101 data, I did not include the other covariates (here), but including them has very little effect on the odds ratios presented here.
So the story here is that we are seeing similar (statistically siginficant) gender pairing preferences in both Freeman’s and the Physics 101 data.
Here we want to better understand the information that the logistic regression is providing in terms of the odds ratios.
The first thing we will do is look at the Physics 101 data (see Table 4.1) and look at the number of each type of gender pair that is present there. Remember that these data come from looking at every single possible pair of students in the course, and then classifying them according to their genders and if they were actually in the same group or not.
GENDERPAIR | shared.group.false | shared.group.true |
---|---|---|
DifGenders | 137019 | 375 |
BothFemale | 100107 | 469 |
BothMale | 46408 | 257 |
We are going to perform the most simple logistic regression that tries to predict if students \(i\) and \(j\) will be in a group together (probability = \(p_{ij}\)) based on if they share their binary gender or not,
\[\textrm{log}\left(\frac{p_{ij}}{1-p_{ij}}\right)= \alpha + \beta\cdot\mathit{GenderPair}+\varepsilon_{ij}.\]
This analysis uses non-shared quantities (i.e., DifGenders) as the baseline against which you compare each possible shared pair (i.e., BothFemale and BothMale). Thus, when we run this logistic regression, we will get two odds ratios, OddsRatio(BothFemale/DifGenders) and OddsRatio(BothMale/DifGenders). If we looks at the first odds ratio using the data from 4.1), it would be calculated as follows,
\[\begin{eqnarray}\mathit{OddsRatio(BothFemale)} &=& \frac{\mathit{Odds(BothFemale)}}{\mathit{Odds(DifGenders)}},\\ &=& \frac{\mathit{N(BothFemale,SameGroup)/N(BothFemale,DifGroup)}}{\mathit{N(DifGenders,SameGroup)/N(DifGenders,DifGroup)}},\\ &=& \frac{469/100107}{375/137019}, \end{eqnarray}\]
which gives a value of 1.71.
Let’s look at the logistic regression now. We will focus on the coefficient GENDERPAIRBothFemale, which is the log of the odds ratio. It is highly significant in the model and has a value of 0.538, with a standard error of 0.069. More on this after the summary of the model.
##
## Call:
## glm(formula = SHAREDGROUP ~ GENDERPAIR, family = binomial, data = df.raw)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.1051 -0.0967 -0.0967 -0.0739 3.4362
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -5.90095 0.05171 -114.125 < 2e-16 ***
## GENDERPAIRBothFemale 0.53756 0.06940 7.746 9.46e-15 ***
## GENDERPAIRBothMale 0.70480 0.08115 8.685 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 14430 on 284634 degrees of freedom
## Residual deviance: 14334 on 284632 degrees of freedom
## AIC: 14340
##
## Number of Fisher Scoring iterations: 8
Emphasizing again that the coefficients from the logistic regression are communicated by R as logs of odds ratios, let’s exponentiate them. Looking at the exponentiated results velow and comparing to the BothFemale results calculated manually above (1.71), we see that they are the same. Good news!
## Odds Ratio 95%CI.LL 95%CI.UL
## BothFemale 1.71 1.49 1.96
## BothMale 2.02 1.72 2.37
We used, what is effectively a contingency table, to calculate the odds ratio for BothFemale previously. Thus we should also be able to compare these results to those that come from Fisher’s exact test.
##
## DifGroup SameGroup
## BothFemale 137019 375
## DifGenders 100107 469
##
## Fisher's Exact Test for Count Data
##
## data: ctable
## p-value = 8.55e-15
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 1.490880 1.966506
## sample estimates:
## odds ratio
## 1.711778
These results, including the 95% Confidence Intervals provided, are also consistent with the results from the logistic regression.
This Physics 101 data set can be used as the basis for Monte-Carlo simulations to further explore multiple aspects of this analysis. These simulations will always have the same number of students, and of groups of 3 or 4 students, but may vary the number of Female/Male students according to the simulations purposes. These simulations can be used to investigate
Table 5.1 shows a comparison of the real Phyics 101 2014-W2 final exam data with the aggregated results from one million Monte-Carlo classes that had no preference for pairing by gender. Based on this, we see that the real data are ~ 9.5 sigma different from completely random data using each of the pairing by gender count rates.
Real | Simulation | |
---|---|---|
Groups of 3 | 21 | 21 |
Groups of 4 | 173 | 173 |
N(Females) | 449 | 449 |
N(Males) | 306 | 306 |
F/M Percentages | 59.5%/40.5% | 59.5%/40.5% |
N-Pairs(Female-Female) | 469 (z=9.7) | 389.0 ± 8.3 |
N-Pairs(Male-Male) | 257 (z=9.4) | 180.5 ± 8.1 |
N-Pairs(Female-Male) | 375 (z=-9.8) | 531.5 ± 16.0 |
Figure 5.1 shows how the rate of Female-Female or Male-Male pairings depends on the rate of Female-Male pairings. The method used to find these was to
Overall, what we see is that these three pairing quantities change together such that the investigation of a data set with Female-Male pairings at -3 sigma relative to random would be the same as investigation of a data set where both the Female-Female and Male-Male pairings were at +3 sigma relative to random.
This is a data set with the standard number of female and male students and with the following number of gender pairings: Female-Female = 389, Male-Male = 180, and Female-Male = 532. These are all of the average numbers from the no-pairing-preference simulations, rounded to the nearest integer. The following are the results from running the logistic regression analysis on these data.
##
## Call:
## glm(formula = SHAREDGROUP ~ GENDERPAIR, family = binomial, data = df.mc.sub)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.0881 -0.0881 -0.0880 -0.0880 3.3340
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -5.550085 0.043439 -127.766 <2e-16 ***
## GENDERPAIRBothFemale -0.001129 0.066840 -0.017 0.987
## GENDERPAIRBothMale -0.003843 0.086394 -0.044 0.965
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 14430 on 284634 degrees of freedom
## Residual deviance: 14430 on 284632 degrees of freedom
## AIC: 14436
##
## Number of Fisher Scoring iterations: 8
As fully expected, these data, built to have the smallest possible gender-pairing preference possible, show no gender pairing preference.
If we vary the number of Female-Female and Male-Male pairs together, we can see how the Odds Ratios for each of these same-gender pairs depends on pairing preference (see 5.2), in units of standard deviation from no pairing preference. Recall that this sample consists 59.5% of female students and 40.5% of male students. We see from this graph that the Odds Ratios for Male-Male pairs are consistently larger and more significant than those for Female-Female pairs. Even in a situation with proportions being only a bit different from 50/50, the impact of the proportions is clearly visible. Given that the preference for each type of same-gender pair was increased by the same amount, it appears that the Odds Ratios and their significance are overestimated for the underrepresented sample.
This raises an important question:
Table 5.2 shows how the means and standard deviations of the number of each type of gender pairing vary with proportions of female/male students. As one would expect, the standard deviations decrease as the F/M proportions become more asymmetric. The quantity Delta.SD shows that the linear sum of standard deviations for F-F and M-M pairings is consistently ~ 0.3-0.4 standard deviations higher than that for the F-M pairings, but this difference decreases as the F/M proportions become more asymmetric. Similarly, the standard deviations of F-F and M-M are equal for equal proportions, but as the proportions become more asymmetric the standard deviation of underrepresented group (M) decreases relative to the overrepresented group (F), with the former being 1.4 times larger than the latter when the representation is 5%/95%.
femalePercent | mean(F-F) | SD(F-F) | mean(M-M) | SD(M-M) | mean(F-M) | SD(F-M) | Delta.sd |
---|---|---|---|---|---|---|---|
50 | 275.6182 | 8.510556 | 274.17822 | 8.512445 | 551.2035 | 16.596480 | 0.4265212 |
60 | 395.9916 | 8.263068 | 175.82098 | 8.084393 | 529.1874 | 15.919553 | 0.4279074 |
70 | 540.2310 | 7.381399 | 98.37816 | 7.038603 | 462.3908 | 13.998964 | 0.4210379 |
80 | 704.4279 | 5.821906 | 43.82637 | 5.334700 | 352.7457 | 10.745110 | 0.4114960 |
90 | 893.0040 | 3.589878 | 10.73976 | 2.954279 | 197.2562 | 6.165935 | 0.3782213 |
95 | 992.8951 | 2.222097 | 2.71830 | 1.566541 | 105.3866 | 3.468439 | 0.3201990 |
I varied the proportion of female and male students in the course from 50/50 to 95/5 while also varying the same-gender-pairing preference from 0 to 5 standard deviations. Because the results vary quite a bit, they are difficult to capture in a single graph, so a few different representations are used.
Figure 5.3 is similar to the graph from the previous section, varying pairing preferencec along the x-axis and representing each different proportion of students as a row. The y-axis for each proportion row was allowed to self-scale.
Figures 5.4 and 5.5 swap the proportions to the x-axis and the pairing preference to the rows. The difference between these two graphs is that the y-axis in Figure 5.4 was allowed to self-scale by proportion row, where the y-axies in Figure 5.5 all share the same scale.
Some take-home messages from the graphs:
Instead of comparing BothFemale or BothMale to DifGenders (3 levels), we could compare BothFemale to BothMale+DifGenders or compare BothMale to BothFemale+DifGenders. When doing this, the OddsRatios both go down and become less significant. This does not help with the asymmetry that results from uneven proportions.