C Symbols, formulas, statistics and parameters

C.1 Symbols and standard errors

TABLE C.1: Sample statistics used to estimate population parameters. Empty table cells means that these are not studied in this book. For statistics with standard errors given, the sampling distribution is approximately normally distributed under certain (statistical validity) conditions.
Parameter Statistic Standard error Ref.
Proportion \(p\) \(\hat{p}\) Confidence interval:
\(\displaystyle\text{s.e.}(\hat{p}) = \sqrt{\frac{ \hat{p} \times (1 - \hat{p})}{n}}\) Def. 24.2
Hypothesis test:
\(\displaystyle\text{s.e.}(\hat{p}) = \sqrt{\frac{ p \times (1 - p)}{n}}\) Def. 24.1
Mean \(\mu\) \(\bar{x}\) \(\displaystyle\text{s.e.}(\bar{x}) = \frac{s}{\sqrt{n}}\) Def. 25.1
Standard deviation \(\sigma\) \(s\)
Mean difference \(\mu_d\) \(\bar{d}\) \(\displaystyle\text{s.e.}(\bar{d}) = \frac{s_d}{\sqrt{n}}\) Def. 27.2
Diff. between means \(\mu_1 - \mu_2\) \(\bar{x}_1 - \bar{x}_2\) \(\displaystyle\text{s.e.}(\bar{x}_1 - \bar{x}_2)\) Def. 28.1
Odds ratio Pop. OR Sample OR
Correlation \(\rho\) \(r\)
Regression line: slope \(\beta_1\) \(b_1\) \(\text{s.e.}(b_1)\) Value given
Regression line: intercept \(\beta_0\) \(b_0\) \(\text{s.e.}(b_0)\) Value given
R-squared \(R^2\)

C.2 Confidence intervals

Confidence intervals have the form
\[ \text{statistic} \pm ( \text{multiplier} \times \text{s.e.}(\text{statistic})). \] when the sampling distribution has an approximate normal distribution.

Notes:

  • The multiplier is approximately 2 for an approximate \(95\)% CI (based on the \(68\)--\(95\)--\(99.7\) rule).
  • The \(\text{multiplier} \times \text{s.e.}(\text{statistic})\) is called the margin of error.
  • When the sampling distribution for the statistic does not have an approximate normal distribution (e.g., for odds ratios and correlation coefficients), this formula does not apply.

C.3 Hypothesis testing

For hypothesis tests, the test statistic is a \(t\)-score, which has the form:
\[ t = \frac{\text{statistic} - \text{parameter}}{\text{s.e.}(\text{statistic})}. \] when the sampling distribution has an approximate normal distribution.

Notes:

  • Since \(t\)-scores are a little like \(z\)-scores (Sect. 33.4), the \(68\)--\(95\)--\(99.7\) rule can be used to approximate \(P\)-values.
  • When the sampling distribution for the statistic does not have an approximate normal distribution (e.g., for odds ratios and correlation coefficients), this formula does not apply.
  • A hypothesis test about odds ratios uses a \(\chi^2\) test statistic, whose value is approximately like a \(z\)-score with a value of
    \[ \sqrt{\frac{\chi^2}{\text{df}}}, \] where \(\text{df}\) is the 'degrees of freedom' given in the software output.

C.4 Sample size estimation

  • To estimate the sample size needed (Sect. 30.3) for estimating a proportion:
    \[ n = \frac{1}{(\text{Margin of error})^2}. \]
  • To estimate the sample size needed (Sect. 30.4) for estimating a mean:
    \[ n = \left( \frac{2\times s}{\text{Margin of error}}\right)^2. \]
  • To estimate the sample size needed (Sect. 30.5) for estimating a mean difference:
    \[ n = \left( \frac{2 \times s_d}{\text{Margin of error}}\right)^2. \]

Notes:

  • In sample size calculations, always round up the sample size found from the above formulas.

C.5 Other formulas

  • The interquartile range (IQR): \(Q_3 - Q_1\), where \(Q_1\) and \(Q_3\) are the first and third quartiles respectively.
  • To calculate \(z\)-scores (Sect. 22.4): \(\displaystyle z = \frac{x - \mu}{\sigma}\) or, more generally,
    \[ z = \frac{\text{value of variable} - \text{mean of distribution}}{\text{standard deviation of distribution}}. \]
  • The unstandardizing formula (Sect. 22.8): \(x = \mu + (z\times \sigma)\).
  • The regression equation in the sample: \(\hat{y} = b_0 + b_1 x\), where \(b_0\) is the sample intercept and \(b_1\) is the sample slope.

Notes:

  • \(t\)-scores are like \(z\)-scores, except that the standard deviation of the distribution includes values estimated from the sample.

C.6 Other symbols used

Symbol Meaning Reference
\(H_0\) Null hypothesis Sect. 33.2
\(H_1\) Alternative hypothesis Sect. 33.2
df Degrees of freedom Sect. 36.4
CI Confidence interval Chap. 26
s.e. Standard error Def. 21.3
\(n\) Sample size
\(\chi^2\) The chi-squared test statistic Sect. 36.4