C Symbols, formulas, statistics and parameters
C.1 Symbols and standard errors
Parameter | Statistic | Standard error | S.E. formula reference | |
---|---|---|---|---|
Proportion (CI) | \(p\) | \(\hat{p}\) | \(\displaystyle\text{s.e.}(\hat{p}) = \sqrt{\frac{ \hat{p} \times (1 - \hat{p})}{n}}\) | Def. 20.2 |
Proportion (Test) | \(p\) | \(\hat{p}\) | \(\displaystyle\text{s.e.}(\hat{p}) = \sqrt{\frac{ p \times (1 - p)}{n}}\) | Def. 20.1 |
Mean | \(\mu\) | \(\bar{x}\) | \(\displaystyle\text{s.e.}(\bar{x}) = \frac{s}{\sqrt{n}}\) | Def. 22.1 |
Standard deviation | \(\sigma\) | \(s\) | ||
Mean difference | \(\mu_d\) | \(\bar{d}\) | \(\displaystyle\text{s.e.}(\bar{d}) = \frac{s_d}{\sqrt{n}}\) | Def. 23.2 |
Diff. between means | \(\mu_1 - \mu_2\) | \(\bar{x}_1 - \bar{x}_2\) | \(\displaystyle\text{s.e.}(\bar{x}_1 - \bar{x}_2)\) | Value given |
Odds ratio | Pop. OR | Sample OR | Value given | |
Correlation | \(\rho\) | \(r\) | ||
Slope of regression line | \(\beta_1\) | \(b_1\) | \(\text{s.e.}(b_1)\) | Value given |
Intercept of regression line | \(\beta_0\) | \(b_0\) | \(\text{s.e.}(b_0)\) | Value given |
R-squared | \(R^2\) |
C.2 Confidence intervals
Confidence intervals have the form
\[
\text{statistic} \pm ( \text{multiplier} \times \text{s.e.}(\text{statistic})).
\]
when the sampling distribution has an approximate normal distribution.
Notes:
- The multiplier is approximately 2 for an approximate 95% CI (based on the 68--95--99.7 rule).
- \(\text{multiplier} \times \text{s.e.}(\text{statistic})\) is called the margin of error.
- When the sampling distribution for the statistic does not have an approximate normal distribution (e.g., for odds ratios and correlation coefficients), this formula does not apply.
C.3 Hypothesis testing
For hypothesis tests, the test statistic is a \(t\)-score, which has the form:
\[
t = \frac{\text{statistic} - \text{parameter}}{\text{s.e.}(\text{statistic})}.
\]
when the sampling distribution has an approximate normal distribution.
Notes:
- Since \(t\)-scores are a little like \(z\)-scores, the 68--95--99.7 rule can be used to approximate \(P\)-values.
- Tests involving odds ratios do not use \(t\)-scores, so this formula does not apply for tests involving odds ratios.
- When the sampling distribution for the statistic does not have an approximate normal distribution (e.g., for odds ratios and correlation coefficients), this formula does not apply.
- A hypothesis test about odds ratios uses a \(\chi^2\) test statistic, whose value is approximately like a \(z\)-score with a value of
\[ \sqrt{\frac{\chi^2}{\text{df}}}. \] where \(\text{df}\) is the 'degrees of freedom' given in the software output.
C.4 Other formulas
To calculate \(z\)-scores (Sect. 17.4): \(\displaystyle z = \frac{x - \mu}{\sigma}\) or, more generally, \[ z = \frac{\text{value of variable} - \text{mean of distribution}}{\text{standard deviation of distribution}}. \]
The unstandardizing formula (Sect. 17.9): \(x = \mu + (z\times \sigma)\).
To estimate the sample size needed (Sect. 26.2) for estimating a proportion:
\[ n = \frac{1}{(\text{Margin of error})^2}. \]To estimate the sample size needed (Sect. 26.3) for estimating a mean:
\[ n = \left( \frac{2\times s}{\text{Margin of error}}\right)^2. \]
Notes:
- In sample size calculations, always round up the sample size found from the above formulas.
- \(t\)-scores are like \(z\)-scores, except that the standard deviation of the distribution includes some values estimated from the sample.