1 Toy models

We look at some toy models to help with interpreting odds ratios and how they are calculated in this context.

1.1 The 12-student model with equal F/M representation

We make a course with 3 four-student groups, 6 females and 6 males. Table 1.1 shows the resulting odds ratios (F-F to F-M and M-M to F-M). What we see is that the presence of groups of 4 same-binary-gender students provides a very large increase in the odds ratio. As expected, the smallest Odds Ratios are 0.5 and happen when the students distribute themselves as to minimize the number of same-binary-gender pairs within groups, which is 3 groups each having 2 females and 2 males.

Table 1.1: Distributions of 6 students of each binary gender into 3 groups of 4. Odds are calculated as the odds of that type of pair being in the same group over the odds that they will be in different groups. Odds Ratios for same binary-gender pairs (FF or MM) are the odds of that same binary-gender pair over the odds of pairs of opposite binary-gender (FM).
Dist.	FF Same Group	MM Same Group	FM Same Group	FF Dif Group	MM Dif Group	FM Dif Group	Odds(FF)	Odds(MM)	Odds(FM)	OR(FF)	OR(MM)
FFFF FFMM MMMM	7	7	4	8	8	32	0.875	0.875	0.125	7.000	7.000
FFFF FMMM FMMM	6	6	6	9	9	30	0.667	0.667	0.200	3.335	3.335
FFFM FFFM MMMM	6	6	6	9	9	30	0.667	0.667	0.200	3.335	3.335
FFFM FFMM FMMM	4	4	10	11	11	26	0.364	0.364	0.385	0.945	0.945
FFMM FFMM FFMM	3	3	12	12	12	24	0.250	0.250	0.500	0.500	0.500

1.2 The 12-student model with unequal F/M representation

We make a course with 3 four-student groups, 7 females and 5 males. Table ?? shows the resulting Odds Ratios.

Table 1.2: Distributions of 7 female and 5 male students of each binary gender into 3 groups of 4. Odds are calculated as the odds of that type of pair being in the same group over the odds that they will be in different groups. Odds Ratios for same binary-gender pairs (FF or MM) are the odds of that same binary-gender pair over the odds of pairs of opposite binary-gender (FM).
Dist.	FF Same Group	MM Same Group	FM Same Group	FF Dif Group	MM Dif Group	FM Dif Group	Odds(FF)	Odds(MM)	Odds(FM)	OR(FF)	OR(MM)
FFFF FFFM MMMM	9	6	3	12	4	32	0.750	1.500	0.094	7.979	15.957
FFFF FFMM FMMM	7	4	7	14	6	28	0.500	0.667	0.250	2.000	2.668
FFFM FFFM FMMM	6	3	9	15	7	26	0.400	0.429	0.346	1.156	1.240
FFFM FFMM FFMM	5	2	11	16	8	24	0.312	0.250	0.458	0.681	0.546

1.3 The 12-student model with unequal F/M representation

We make a course with 2 four-student group and 1 three-student group, and with 7 females and 5 males. Table ?? shows the resulting Odds Ratios.

Table 1.3: Distributions of 6 female and 5 male students of each binary gender into two groups of 4 and one group of 3. Odds are calculated as the odds of that type of pair being in the same group over the odds that they will be in different groups. Odds Ratios for same binary-gender pairs (FF or MM) are the odds of that same binary-gender pair over the odds of pairs of opposite binary-gender (FM).
	Dist.	FF Same Group	MM Same Group	FM Same Group	FF Dif Group	MM Dif Group	FM Dif Group	Odds(FF)	Odds(MM)	Odds(FM)	OR(FF)	OR(MM)
2	FFFF MMMM FFM	7	6	2	8	4	28	0.875	1.500	0.071	12.324	21.127
5	FFFM MMMM FFF	6	6	3	9	4	27	0.667	1.500	0.111	6.009	13.514
1	FFFF FFMM MMM	7	4	4	8	6	26	0.875	0.667	0.154	5.682	4.331
3	FFFF FMMM FMM	6	4	5	9	6	25	0.667	0.667	0.200	3.335	3.335
4	FFFM FFFM MMM	6	3	6	9	7	24	0.667	0.429	0.250	2.668	1.716
7	FFFM FMMM FFM	4	3	8	11	7	22	0.364	0.429	0.364	1.000	1.179
6	FFFM FFMM FMM	4	2	9	11	8	21	0.364	0.250	0.429	0.848	0.583
8	FFMM FFMM FFM	3	2	10	12	8	20	0.250	0.250	0.500	0.500	0.500

2 The inspiration for the analysis approach

Freeman 2017 outlines a 700 student study where they looked at how students self-selected themselves into groups of 3 throughout the term by various demographic and performacne measures.

Freeman, S., Theobald, R., Crowe, A. J., & Wenderoth, M. P. (2017). Likes attract: Students self-sort in a classroom by gender, demography, and academic characteristics. Active Learning in Higher Education, 18(2), 115-126. https://doi.org/10.1177/1469787417707614

2.1 Their course details

700 students in the course, working in self-selected groups of 3
They were asked to sit in the same area of the classroom to their lab TA (16 TAs in total)
On days 2, 4, 14, 18, and 38 (of 40), they completed 30-minutes worksheets and the group composition was recoreded
They also answered clicker questions every class

2.2 Their analysis and results

Their approach is to use a logistic regression equation where the unit of analysis is the student pair, where in their 700 students they cycle through every possible pair of students. The baseline for each category/variable is groups that do not share the same characteristic, such as the probability that two females will work together (“students share a covariate”) will be compared to the probability that a male and a female will work together.

Of note, the demographics range from somewhat ballanced (60.5% female vs 39.5% male) to Table I from Freeman 2017 – Paying attention to the

Figure 2.1 shows the results of their odds ratios. Of note, the largest Odds Ratios (Both African American and Both International) are associated with groups that have very small representation in the coures (3.3% and 6.0%, respectively).

Odds Ratios across various demographics. Error bars are 95% confidence intervals. These are omitted on the bottom graph due to small sample sizes in some of the demographic categories, which would have caused some y-axis scaling issues due to very large confidence interval ranges.

Figure 2.1: Odds Ratios across various demographics. Error bars are 95% confidence intervals. These are omitted on the bottom graph due to small sample sizes in some of the demographic categories, which would have caused some y-axis scaling issues due to very large confidence interval ranges.

2.3 Concerns and questions that arise from the Freeman paper

This analysis method strikes me as a useful way to quantify how much students are grouping by similar demographic and performance measures. However, their approach raises the following concerns and questions

What is the impact of proportion within the sample on the significance of the results? All things being equal, is it easier to significant results for the smaller populations than for the larger ones? For example, Figure 2.1 shows that the groups with small representations, such as African Americans at 3.3%, have very large Odds Ratios. From their Table 2, we also see that they have very large 95% confidence intervals on these Odds Ratios as well.
When creating Low/Medium/High quantiles for their performance measures, their bin sizes were 25%/50%/25% instead of 33.3%/33.3%/33.3%. Based on the previous bullet point, does this create a false sense of the impact on these students in the Low and High bands?
Their baselines for a given covariate are potentially problematic when the covariate has more than 2 categories. This is because the baseline for the Odds Ratio of two African Americans being in the same group is any two students that do not share the same ethnic background. So this could be an African American and an Asian American student or it could be a Hispanic and a Caucasian student. However, the odds ratio that makes more sense is the number of pairs of African American students compared to the number of pairs of students that only have one African American student in their membership.
Further to the above point, what are the counting consequences of this analysis and is this why trying to think of the analysis through the lens described above is problematic. For example, if we consider this from the point of view of an individual African American student, when they are paired with another African American student, that second student does not also participate in the counting.
What do they do for students coming from very small categories, such as Pacific Islanders, in their analysis?
- The answer to this is somewhat problematic. From Page 4/118: “Because so few Hawaiian/Pacific Islander (0.9%) and Native American (1.1%) students were registered, these two groups were excluded from the analysis.” It is unclear how they were excluded. Removing a student removes them from every potential pair that they were in. Also, they do not justify why 1.1% and below is a reasonable threshold for exclusion, but African American students (3.3%) are not excluded.

3 Introduction to my exploratory study

Exploratory data analyses were performed using final exam data from Physics 101, 2014-W2 (a.k.a. Jan, 2015). These data were used in a variet of ways to better understand concerns from the Freeman analysis and to establish how analyses were to be performed across the many data sets that I have available.

3.1 Overview of the exploratory data set

The data set has the following features:

Approximately 700 students, which is roughly the same size as the Freeman data set.
Students self-selected into groups of 3 or 4 for this final exam. Students were allowed to mix between sections if they wished.
This was the final exam, but the students had also taken 2-phase exams on two midterms in the course, but were not required to work in the same groups across any of these exams.
This many students produces ~ 0.25 million data points for one exam because each possible student pair is a data point.

The student covariates are

Grades on each test and in the course
Forced-binary gender (59% Female, 51% Male)
Year status (87% Y1, 9% Y2)
Program (77% BSc; 6% Applied Biology; 6% BA; 5% Foods)
Lecture Section
Lab section
Group size (89% groups of 4)

4 An illustrative model: Pairing by binary gender

The Freeman data set and the Physics 101 2014-W2 data sets are very similar in terms of their sizes (~700) and proportions of females (~60%) and males (~40%) in the courses. Thus, this demographic variable is potentially quite valuable in trying to make sense of the pairing preferences that arise from the logistic regression and of the possible sensitivity that this analysis might have to extreme proportions by using simulated data to explore the impact on a (forced) binary category.

4.1 Overview of results from Freeman of Physics 101

Freeman’s data set has measurements at 5 data points. From Figure 4.1, we see that the odds ratio for the final two data points (Days 18 and 38) are on the order of 1.5 for both females and males. They are slightly higher for the Physics 101 data set. For the Freeman data set, these Odds Ratios represent results when controlling for a number of other covariates. For the Physics 101 data, I did not include the other covariates (here), but including them has very little effect on the odds ratios presented here.

Odds Ratios for binary gender pairs for Freeman's data set and for Physics 101 data set using a model that includes only gender as a covariate.

Figure 4.1: Odds Ratios for binary gender pairs for Freeman’s data set and for Physics 101 data set using a model that includes only gender as a covariate.

So the story here is that we are seeing similar (statistically siginficant) gender pairing preferences in both Freeman’s and the Physics 101 data.

4.2 Comparing the Physics 101 Odds Ratios from logistic regression results to contingency table Fisher Test

Here we want to better understand the information that the logistic regression is providing in terms of the odds ratios.

4.2.1 Manual calculation of odds ratios for Physics 101 data

The first thing we will do is look at the Physics 101 data (see Table 4.1) and look at the number of each type of gender pair that is present there. Remember that these data come from looking at every single possible pair of students in the course, and then classifying them according to their genders and if they were actually in the same group or not.

Table 4.1: Counts of the different binary-gender pairings within the real Physics 101 2014-W2 final exam data set
GENDERPAIR	shared.group.false	shared.group.true
DifGenders	137019	375
BothFemale	100107	469
BothMale	46408	257

We are going to perform the most simple logistic regression that tries to predict if students $i$ and $j$ will be in a group together (probability = $p_{ij}$ ) based on if they share their binary gender or not,

$\textrm{log}\left(\frac{p_{ij}}{1-p_{ij}}\right)= \alpha + \beta\cdot\mathit{GenderPair}+\varepsilon_{ij}.$

This analysis uses non-shared quantities (i.e., DifGenders) as the baseline against which you compare each possible shared pair (i.e., BothFemale and BothMale). Thus, when we run this logistic regression, we will get two odds ratios, OddsRatio(BothFemale/DifGenders) and OddsRatio(BothMale/DifGenders). If we looks at the first odds ratio using the data from 4.1), it would be calculated as follows,

$\begin{eqnarray}\mathit{OddsRatio(BothFemale)} &=& \frac{\mathit{Odds(BothFemale)}}{\mathit{Odds(DifGenders)}},\\ &=& \frac{\mathit{N(BothFemale,SameGroup)/N(BothFemale,DifGroup)}}{\mathit{N(DifGenders,SameGroup)/N(DifGenders,DifGroup)}},\\ &=& \frac{469/100107}{375/137019}, \end{eqnarray}$

which gives a value of 1.71.

4.2.2 Logistic regression results for Physics 101 data

Let’s look at the logistic regression now. We will focus on the coefficient GENDERPAIRBothFemale, which is the log of the odds ratio. It is highly significant in the model and has a value of 0.538, with a standard error of 0.069. More on this after the summary of the model.

## 
## Call:
## glm(formula = SHAREDGROUP ~ GENDERPAIR, family = binomial, data = df.raw)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.1051  -0.0967  -0.0967  -0.0739   3.4362  
## 
## Coefficients:
##                      Estimate Std. Error  z value Pr(>|z|)    
## (Intercept)          -5.90095    0.05171 -114.125  < 2e-16 ***
## GENDERPAIRBothFemale  0.53756    0.06940    7.746 9.46e-15 ***
## GENDERPAIRBothMale    0.70480    0.08115    8.685  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 14430  on 284634  degrees of freedom
## Residual deviance: 14334  on 284632  degrees of freedom
## AIC: 14340
## 
## Number of Fisher Scoring iterations: 8

Emphasizing again that the coefficients from the logistic regression are communicated by R as logs of odds ratios, let’s exponentiate them. Looking at the exponentiated results velow and comparing to the BothFemale results calculated manually above (1.71), we see that they are the same. Good news!

##            Odds Ratio 95%CI.LL 95%CI.UL
## BothFemale       1.71     1.49     1.96
## BothMale         2.02     1.72     2.37

4.2.3 Calculating Odds Ratios using Fisher’s Exact Test

We used, what is effectively a contingency table, to calculate the odds ratio for BothFemale previously. Thus we should also be able to compare these results to those that come from Fisher’s exact test.

##             
##              DifGroup SameGroup
##   BothFemale   137019       375
##   DifGenders   100107       469

## 
##  Fisher's Exact Test for Count Data
## 
## data:  ctable
## p-value = 8.55e-15
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  1.490880 1.966506
## sample estimates:
## odds ratio 
##   1.711778

These results, including the 95% Confidence Intervals provided, are also consistent with the results from the logistic regression.

5 Monte-Carlo simulations based on the Physics 101 data set

This Physics 101 data set can be used as the basis for Monte-Carlo simulations to further explore multiple aspects of this analysis. These simulations will always have the same number of students, and of groups of 3 or 4 students, but may vary the number of Female/Male students according to the simulations purposes. These simulations can be used to investigate

If there were no preference for pairing by gender, how would the pairings in this class have looked?
Does the average class in this paradigm show no pairing preferences?
If we use the number of standard deviations away from the mean for the number of F-F and/or M-M pairings, how far from “no pairing preference” do the real Physics 101 data lie?
How far away from these means do the results need to be to find significant results? Is there a further dependence based on the proportions of F/M in the course?

5.1 Simulating the Physics 101 course with no pairing by gender

Table 5.1 shows a comparison of the real Phyics 101 2014-W2 final exam data with the aggregated results from one million Monte-Carlo classes that had no preference for pairing by gender. Based on this, we see that the real data are ~ 9.5 sigma different from completely random data using each of the pairing by gender count rates.

Table 5.1: Test Caption
	Real	Simulation
Groups of 3	21	21
Groups of 4	173	173
N(Females)	449	449
N(Males)	306	306
F/M Percentages	59.5%/40.5%	59.5%/40.5%
N-Pairs(Female-Female)	469 (z=9.7)	389.0 ± 8.3
N-Pairs(Male-Male)	257 (z=9.4)	180.5 ± 8.1
N-Pairs(Female-Male)	375 (z=-9.8)	531.5 ± 16.0

5.1.1 Comparison of how gender pairings behave with respect to each other within the simulation data

Figure 5.1 shows how the rate of Female-Female or Male-Male pairings depends on the rate of Female-Male pairings. The method used to find these was to

Use the mean and standard deviation to find a target number of Female-Male pairings, which would then be rounded to the nearest integer. For example, the -1 sigma value for the number of Female-Male pairings would be 531.5 - 16.0 = 515.
For only the subset of the data with a number of Female-Male pairings = 515, calculate the mean number of Female-Female and Male-Male pairings and then convert those to z-values based on the original means and standard deviations for those quantities.

Overall, what we see is that these three pairing quantities change together such that the investigation of a data set with Female-Male pairings at -3 sigma relative to random would be the same as investigation of a data set where both the Female-Female and Male-Male pairings were at +3 sigma relative to random.

Figure 5.1: Dependence of same gender pairing rates on opposite gender pairing rates, in units of standard deviation.

5.1.2 Does the average no-gender-pairing-preference simulated class show no pairing preferences?

This is a data set with the standard number of female and male students and with the following number of gender pairings: Female-Female = 389, Male-Male = 180, and Female-Male = 532. These are all of the average numbers from the no-pairing-preference simulations, rounded to the nearest integer. The following are the results from running the logistic regression analysis on these data.

## 
## Call:
## glm(formula = SHAREDGROUP ~ GENDERPAIR, family = binomial, data = df.mc.sub)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -0.0881  -0.0881  -0.0880  -0.0880   3.3340  
## 
## Coefficients:
##                       Estimate Std. Error  z value Pr(>|z|)    
## (Intercept)          -5.550085   0.043439 -127.766   <2e-16 ***
## GENDERPAIRBothFemale -0.001129   0.066840   -0.017    0.987    
## GENDERPAIRBothMale   -0.003843   0.086394   -0.044    0.965    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 14430  on 284634  degrees of freedom
## Residual deviance: 14430  on 284632  degrees of freedom
## AIC: 14436
## 
## Number of Fisher Scoring iterations: 8

As fully expected, these data, built to have the smallest possible gender-pairing preference possible, show no gender pairing preference.

5.1.3 How do gender-pairing odds ratios depend on distance from no-pairing-prefence?

If we vary the number of Female-Female and Male-Male pairs together, we can see how the Odds Ratios for each of these same-gender pairs depends on pairing preference (see 5.2), in units of standard deviation from no pairing preference. Recall that this sample consists 59.5% of female students and 40.5% of male students. We see from this graph that the Odds Ratios for Male-Male pairs are consistently larger and more significant than those for Female-Female pairs. Even in a situation with proportions being only a bit different from 50/50, the impact of the proportions is clearly visible. Given that the preference for each type of same-gender pair was increased by the same amount, it appears that the Odds Ratios and their significance are overestimated for the underrepresented sample.

This raises an important question:

Is the method of varying gender pairs according to their standard deviation convinging? Recall that these were determined from the large sample of no-gender-pairing-preference Monte-Carlo simulations.

Odds Ratios, determined by logistic regression as a function of same-gender pairing preference as measured in units of standard deviation. These preferences were held equal to each other at each value of z. The numbers under the data points are effect sizes, calculated using Cohen's d.

Figure 5.2: Odds Ratios, determined by logistic regression as a function of same-gender pairing preference as measured in units of standard deviation. These preferences were held equal to each other at each value of z. The numbers under the data points are effect sizes, calculated using Cohen’s d.

5.1.4 How do standard deviations of gender-pairing numbers vary with proportions of female and male students?

Table 5.2 shows how the means and standard deviations of the number of each type of gender pairing vary with proportions of female/male students. As one would expect, the standard deviations decrease as the F/M proportions become more asymmetric. The quantity Delta.SD shows that the linear sum of standard deviations for F-F and M-M pairings is consistently ~ 0.3-0.4 standard deviations higher than that for the F-M pairings, but this difference decreases as the F/M proportions become more asymmetric. Similarly, the standard deviations of F-F and M-M are equal for equal proportions, but as the proportions become more asymmetric the standard deviation of underrepresented group (M) decreases relative to the overrepresented group (F), with the former being 1.4 times larger than the latter when the representation is 5%/95%.

Table 5.2: Means and standard deviations of gender pairing counts by proportion of female and male students. Delta.SD = SD(F-F) + SD(M-M) - SD(F-M).
femalePercent	mean(F-F)	SD(F-F)	mean(M-M)	SD(M-M)	mean(F-M)	SD(F-M)	Delta.sd
50	275.6182	8.510556	274.17822	8.512445	551.2035	16.596480	0.4265212
60	395.9916	8.263068	175.82098	8.084393	529.1874	15.919553	0.4279074
70	540.2310	7.381399	98.37816	7.038603	462.3908	13.998964	0.4210379
80	704.4279	5.821906	43.82637	5.334700	352.7457	10.745110	0.4114960
90	893.0040	3.589878	10.73976	2.954279	197.2562	6.165935	0.3782213
95	992.8951	2.222097	2.71830	1.566541	105.3866	3.468439	0.3201990

5.1.5 How do gender-pairing odds ratios depend on proportions of female and male students?

I varied the proportion of female and male students in the course from 50/50 to 95/5 while also varying the same-gender-pairing preference from 0 to 5 standard deviations. Because the results vary quite a bit, they are difficult to capture in a single graph, so a few different representations are used.

Figure 5.3 is similar to the graph from the previous section, varying pairing preferencec along the x-axis and representing each different proportion of students as a row. The y-axis for each proportion row was allowed to self-scale.

Figures 5.4 and 5.5 swap the proportions to the x-axis and the pairing preference to the rows. The difference between these two graphs is that the y-axis in Figure 5.4 was allowed to self-scale by proportion row, where the y-axies in Figure 5.5 all share the same scale.

Odds Ratios, determined by logistic regression as a function of pairing preference. Preferences were held equal to each other at each value. The rows are stratified by the percentage of female students in the simulated course.

Figure 5.3: Odds Ratios, determined by logistic regression as a function of pairing preference. Preferences were held equal to each other at each value. The rows are stratified by the percentage of female students in the simulated course.

Odds Ratios, determined by logistic regression as a function of percentage female students in the simulated course. Rows are stratified by same-gender pairing preference, measured in units of standard deviation and held equal to each other at each value. The y-axes for each row were allowed to self-scale.

Figure 5.4: Odds Ratios, determined by logistic regression as a function of percentage female students in the simulated course. Rows are stratified by same-gender pairing preference, measured in units of standard deviation and held equal to each other at each value. The y-axes for each row were allowed to self-scale.

Figure 5.5: Odds Ratios, determined by logistic regression as a function of percentage female students in the simulated course. Rows are stratified by same-gender pairing preference, measured in units of standard deviation and held equal to each other at each value. The y-axes for each row were held fixed relative to each other.

Some take-home messages from the graphs:

When close to gender-balanced (50-60% female), we start to see significance in pairing at 3 sigma and highly significant at 4 sigma.
For highly skewed courses (90-95% female), we see significance for MM pairing start at 2 sigma, but almost no significance for FF pairing (one star for 5 sigma at 90% female). It is important to remember that we are out at 5 sigma for the number of FF pairings here so this really shows how sensitive and asymmetric these are.
Even for the most highly skewed courses, we don’t see significance coming out of nowhere, it just seems to be that they are much more sensitive to small changes for much smaller minorities, .

5.1.6 What does switching to two-level comparators do?

Instead of comparing BothFemale or BothMale to DifGenders (3 levels), we could compare BothFemale to BothMale+DifGenders or compare BothMale to BothFemale+DifGenders. When doing this, the OddsRatios both go down and become less significant. This does not help with the asymmetry that results from uneven proportions.

6 Questions and action items

6.1 To-do

Change instances of fraction female to always include fraction female and fraction male

6.2 My open questions

What are the effects of trying to vary the number of only one type of same gender pairings relative to the other. The total number of gender pairings is a fixed quantity because the size of the course and numbers of groups of each size are fixed. For example, increasing Female-Female pairs, while holding Male-Male pairs constant will decrease the number of Female-Male pairs. From the odds ratios, this means that the Female-Male odds should have gone down, which will increase the Male-Male odds ratio despite the Male-Male odds remaining the same.
Is this method of increasing the same gender pairs together compelling?

6.3 Thoughts and questions from the Apr 29, 2020 PHASER meeting

Based on the closed form method of calculating Odds Ratios, can I make a functional equation for calculating odds ratios? Is there a way to bring in the confidence intervals from Fisher’s Exact Test? (Jared)
Can I use an affinity model built into the simulation to verify if my affinity model makes sense?
[Added Section 5.1.4] How do the SDs vary within a given proportion? What happens if I use pooled SD?
Table 5.2 shows that for the 95%/5% F/M proportions, the 1 stdev shift is ~0.2% shift for BothF and ~50% shift for BothM. These seem like very different amounts of “group affinity”. Wouldn’t increasing by the same relative % shift make more sense? (Jonathan, building on a previous comment from Jared)
Can you de-couple F-F affinity from M-M affinity? (Jonathan)

Student grouping using students pairs as the unit of analysis - Initial Report - April, 2020

Joss Ives

20/04/2020