# C Symbols, formulas, statistics and parameters

## C.1 Symbols and standard errors

Parameter | Statistic | Standard error | Ref. | |
---|---|---|---|---|

Proportion | \(p\) | \(\hat{p}\) | Confidence interval: | |

\(\displaystyle\text{s.e.}(\hat{p}) = \sqrt{\frac{ \hat{p} \times (1 - \hat{p})}{n}}\) | Def. 24.2 | |||

Hypothesis test: | ||||

\(\displaystyle\text{s.e.}(\hat{p}) = \sqrt{\frac{ p \times (1 - p)}{n}}\) | Def. 24.1 | |||

Mean | \(\mu\) | \(\bar{x}\) | \(\displaystyle\text{s.e.}(\bar{x}) = \frac{s}{\sqrt{n}}\) | Def. 25.1 |

Standard deviation | \(\sigma\) | \(s\) | ||

Mean difference | \(\mu_d\) | \(\bar{d}\) | \(\displaystyle\text{s.e.}(\bar{d}) = \frac{s_d}{\sqrt{n}}\) | Def. 27.2 |

Diff. between means | \(\mu_1 - \mu_2\) | \(\bar{x}_1 - \bar{x}_2\) | \(\displaystyle\text{s.e.}(\bar{x}_1 - \bar{x}_2)\) | Def. 28.1 |

Odds ratio | Pop. OR | Sample OR | ||

Correlation | \(\rho\) | \(r\) | ||

Regression line: slope | \(\beta_1\) | \(b_1\) | \(\text{s.e.}(b_1)\) | Value given |

Regression line: intercept | \(\beta_0\) | \(b_0\) | \(\text{s.e.}(b_0)\) | Value given |

R-squared | \(R^2\) |

## C.2 Confidence intervals

**Confidence intervals** have the form

\[
\text{statistic} \pm ( \text{multiplier} \times \text{s.e.}(\text{statistic})).
\]
when the sampling distribution has an approximate normal distribution.

**Notes:**

- The multiplier is
*approximately*2 for an*approximate*\(95\)% CI (based on the \(68\)--\(95\)--\(99.7\) rule). - The \(\text{multiplier} \times \text{s.e.}(\text{statistic})\) is called the
*margin of error*. - When the sampling distribution for the statistic does not have an approximate normal distribution (e.g., for
*odds ratios*and*correlation coefficients*),**this formula does not apply**.

## C.3 Hypothesis testing

For **hypothesis tests**, the *test statistic* is a \(t\)-score, which has the form:

\[
t = \frac{\text{statistic} - \text{parameter}}{\text{s.e.}(\text{statistic})}.
\]
when the sampling distribution has an approximate normal distribution.

**Notes:**

- Since \(t\)-scores are a little like \(z\)-scores (Sect. 33.4), the \(68\)--\(95\)--\(99.7\) rule can be used to
*approximate*\(P\)-values. - When the sampling distribution for the statistic does not have an approximate normal distribution (e.g., for
*odds ratios*and*correlation coefficients*),**this formula does not apply**. - A hypothesis test about
**odds ratios**uses a \(\chi^2\) test statistic, whose value is approximately like a \(z\)-score with a value of

\[ \sqrt{\frac{\chi^2}{\text{df}}}, \] where \(\text{df}\) is the 'degrees of freedom' given in the software output.

## C.4 Sample size estimation

- To estimate the sample size needed (Sect. 30.3) for
**estimating a proportion**:

\[ n = \frac{1}{(\text{Margin of error})^2}. \] - To estimate the sample size needed (Sect. 30.4) for
**estimating a mean**:

\[ n = \left( \frac{2\times s}{\text{Margin of error}}\right)^2. \] - To estimate the sample size needed (Sect. 30.5) for
**estimating a mean difference**:

\[ n = \left( \frac{2 \times s_d}{\text{Margin of error}}\right)^2. \]

**Notes:**

- In
**sample size calculations**, always**round up**the sample size found from the above formulas.

## C.5 Other formulas

- The
**interquartile range**(IQR): \(Q_3 - Q_1\), where \(Q_1\) and \(Q_3\) are the first and third quartiles respectively. - To
**calculate \(z\)-scores**(Sect. 22.4): \(\displaystyle z = \frac{x - \mu}{\sigma}\) or, more generally,

\[ z = \frac{\text{value of variable} - \text{mean of distribution}}{\text{standard deviation of distribution}}. \] - The
**unstandardizing formula**(Sect. 22.8): \(x = \mu + (z\times \sigma)\). - The
**regression equation**in the*sample*: \(\hat{y} = b_0 + b_1 x\), where \(b_0\) is the sample intercept and \(b_1\) is the sample slope.

**Notes:**

- \(t\)-scores are like \(z\)-scores, except that the standard deviation of the distribution includes values estimated from the sample.