## 31.3 Expected values: Comparing odds

Assuming that the odds of having most meals off-campus
is the same for both groups
(that is, the population OR is one),
how would the sample OR be **expected** to vary
from sample to sample
just because of *sampling variation*?

If the population OR was one,
the odds are the same in both groups;
equivalently,
the percentages are the same in both groups.
That is,
the percentage
of students eating most meals off-campus
is the same for students *living with*
and *not living with* their parents.

Let’s consider the implication. From Table 31.1, 157 students out of 183 ate most meals off-campus; that is,

\[ \frac{157}{183} \times 100 = 85.79\% \] of the students in the entire sample ate most of their meals off-campus.

If the percentage of students
who eat most of their meals off-campus
is the *same* for those who live with their parents
and those who don’t,
then we’d **expect** 85.79% of students
in *both* groups to be equal to this value.
That is, we would expect

- 85.79% of the 54 students (that is, 46.33)
who
*live with their parents*to eat most meals off-campus; and - 85.79% of the 129 students (that is, 110.67)
who
*don’t live with their parents*to eat most meals off-campus.

That is,
the percentage (and hence the odds)
is the same in each group.
Those are the numbers that are *expected* to appear
if the percentage was exactly the same in each group
(Table 31.3),
if the null hypothesis (the assumption) was true.

**Think 31.1 (OR for expected counts) **Consider the expected counts in
Table 31.3.

*odds*of having most meals off-campus is the same for students living with their parents, and for students not living with their parents.

How do those *expected values* compare to what was *observed*?
For example:

- 46.33 of the 54 students
who
*live with their parents*are**expected**to eat most meals off-campus; yet we observed 52. - 110.67 of the 129 students
who
*don’t live with their parents*are**expected**to eat most meals off-campus; yet we observed 105.

The observed and expected counts are similar,
but not the exactly same.
This is no surprise:
each sample will produce slightly different observed counts
(*sampling variation*).
The difference between what the observed and expected counts
may be explained by sampling variation
(that is, the null hypothesis explanation).

When discussing previous hypothesis tests,
the *sampling distribution* of the sample statistic
(in this case, the sampling distribution of the sample odds ratio)
was described,
and this sampling distribution had an approximate normal distribution
(whose standard deviation is called the *standard error*).
However,
the sampling distribution of the odds ratio
is more involved^{13}
so will not be presented.

Lives with parents | Doesn’t live with parents | Total | |
---|---|---|---|

Most off-campus | 46.33 | 110.67 | 157 |

Most on-campus | 7.67 | 18.33 | 26 |

Total | 54.00 | 129.00 | 183 |

For those who wish to know: The

*logarithm*of the sample ORs have an approximate normal distribution, and a*standard error*.↩︎