# C Symbols, formulas, statistics and parameters

## C.1 Symbols and standard errors

TABLE C.1: Some sample statistics used to estimate population parameters. Empty table cells means that these are not studied in this textbook. For statistics with standard errors given, the sampling distribution is approximately normally distributed under certain (statistical validity) conditions.
Parameter Statistic Standard error S.E. formula reference
Proportion (CI) $$p$$ $$\hat{p}$$ $$\displaystyle\text{s.e.}(\hat{p}) = \sqrt{\frac{ \hat{p} \times (1 - \hat{p})}{n}}$$ Def. 20.2
Proportion (Test) $$p$$ $$\hat{p}$$ $$\displaystyle\text{s.e.}(\hat{p}) = \sqrt{\frac{ p \times (1 - p)}{n}}$$ Def. 20.1
Mean $$\mu$$ $$\bar{x}$$ $$\displaystyle\text{s.e.}(\bar{x}) = \frac{s}{\sqrt{n}}$$ Def. 22.1
Standard deviation $$\sigma$$ $$s$$
Mean difference $$\mu_d$$ $$\bar{d}$$ $$\displaystyle\text{s.e.}(\bar{d}) = \frac{s_d}{\sqrt{n}}$$ Def. 23.2
Diff. between means $$\mu_1 - \mu_2$$ $$\bar{x}_1 - \bar{x}_2$$ $$\displaystyle\text{s.e.}(\bar{x}_1 - \bar{x}_2)$$ Value given
Odds ratio Pop. OR Sample OR Value given
Correlation $$\rho$$ $$r$$
Slope of regression line $$\beta_1$$ $$b_1$$ $$\text{s.e.}(b_1)$$ Value given
Intercept of regression line $$\beta_0$$ $$b_0$$ $$\text{s.e.}(b_0)$$ Value given
R-squared $$R^2$$

## C.2 Confidence intervals

Confidence intervals have the form
$\text{statistic} \pm ( \text{multiplier} \times \text{s.e.}(\text{statistic})).$ when the sampling distribution has an approximate normal distribution.

Notes:

• The multiplier is approximately 2 for an approximate 95% CI (based on the 68--95--99.7 rule).
• $$\text{multiplier} \times \text{s.e.}(\text{statistic})$$ is called the margin of error.
• When the sampling distribution for the statistic does not have an approximate normal distribution (e.g., for odds ratios and correlation coefficients), this formula does not apply.

## C.3 Hypothesis testing

For hypothesis tests, the test statistic is a $$t$$-score, which has the form:
$t = \frac{\text{statistic} - \text{parameter}}{\text{s.e.}(\text{statistic})}.$ when the sampling distribution has an approximate normal distribution.

Notes:

• Since $$t$$-scores are a little like $$z$$-scores, the 68--95--99.7 rule can be used to approximate $$P$$-values.
• Tests involving odds ratios do not use $$t$$-scores, so this formula does not apply for tests involving odds ratios.
• When the sampling distribution for the statistic does not have an approximate normal distribution (e.g., for odds ratios and correlation coefficients), this formula does not apply.
• A hypothesis test about odds ratios uses a $$\chi^2$$ test statistic, whose value is approximately like a $$z$$-score with a value of
$\sqrt{\frac{\chi^2}{\text{df}}}.$ where $$\text{df}$$ is the 'degrees of freedom' given in the software output.

## C.4 Other formulas

• To calculate $$z$$-scores (Sect. 17.4): $$\displaystyle z = \frac{x - \mu}{\sigma}$$ or, more generally, $z = \frac{\text{value of variable} - \text{mean of distribution}}{\text{standard deviation of distribution}}.$

• The unstandardizing formula (Sect. 17.9): $$x = \mu + (z\times \sigma)$$.

• To estimate the sample size needed (Sect. 26.2) for estimating a proportion:
$n = \frac{1}{(\text{Margin of error})^2}.$

• To estimate the sample size needed (Sect. 26.3) for estimating a mean:
$n = \left( \frac{2\times s}{\text{Margin of error}}\right)^2.$

Notes:

• In sample size calculations, always round up the sample size found from the above formulas.
• $$t$$-scores are like $$z$$-scores, except that the standard deviation of the distribution includes some values estimated from the sample.

## C.5 Other symbols used

Symbol Meaning Reference
$$H_0$$ Null hypothesis Sect. 30.2
$$H_1$$ Alternative hypothesis Sect. 30.2
df Degrees of freedom Sect. 33.4
CI Confidence interval Chap. 21
s.e. Standard error Def. 18.3
$$n$$ Sample size
$$\chi^2$$ The chi-squared test statistic Sect. 33.4