C Symbols, formulas, statistics and parameters
C.1 Symbols and standard errors
The following table lists the statistics used to estimate unknown population parameters.
The sampling distribution is given for each statistic, where possible.
When the sampling distribution is approximately normally distributed, under appropriate statistical validity conditions, this is indicated by .
-
The value of the mean of the sampling distribution (the sampling mean) is:
- unknown, for confidence intervals.
- assumed to be the value given in the null hypothesis, for hypothesis tests.
Statistic | sampling mean | distn? | error | Ref. | |
---|---|---|---|---|---|
Proportion | ˆp | p | ✔ | CI: √ˆp×(1−ˆp)n | Ch. 22 |
✔ | HT: √p×(1−p)n | Ch. 26 | |||
Mean | ˉx | μ | ✔ | s√n | Chs. 23, 27 |
Mean difference | ˉd | μd | ✔ | sd√n | Ch. 29 |
Difference between means | ˉx1−ˉx2 | μ1−μ2 | ✔ | √s.e.(ˉx1)+s.e.(ˉx2) | Ch. 30 |
Difference between proportions | ˆp1−ˆp2 | p1−p2 | ✔ | CI: √s.e.(ˆp1)+s.e.(ˆp2) | Ch. 31 |
✔ | Ch. 31 | ||||
Odds ratio | Sample OR | Pop. OR | ✘ | (Not given) | Ch. 31 |
Correlation | r | ρ | ✘ | (Not given) | Ch. 33 |
Regression: slope | b1 | β1 | ✔ | s.e.(b1) (value from software) | Ch. 33 |
Regression: intercept | b0 | β0 | ✔ | s.e.(b0) (value from software) | Ch. 33 |
C.2 Confidence intervals
For statistics whose sampling distribution has an approximate normal distribution, confidence intervals have the form statistic±(multiplier×s.e.(statistic)).
Notes:
- The multiplier is approximately 2 to create an approximate 95% CI (based on the 68--95--99.7 rule).
- The quantity 'multiplier×s.e.(statistic)' is called the margin of error.
- Software uses exact multipliers to form exact confidence intervals.
- When the sampling distribution for the statistic does not have an approximate normal distribution (e.g., for odds ratios and correlation coefficients), this formula does not apply and the CIs are taken directly from software output when available.
C.3 Hypothesis testing
For statistics that have an approximate normal distribution, the test statistic has the form: test statistic=statistic−parameters.e.(statistic), where s.e.(statistic) is the standard error of the statistic. The test-statistic is a t-score for most hypothesis tests in this book when the sampling distribution is described by a normal distribution, but is a z-score for a hypothesis test involving one or two proportions.
Notes:
- If the test-statistic is a z-score, the P-value can be found using tables (Appendix B.1), or approximated using the 68--95--99.7 rule.
- If the test-statistic is a t-score, the P-value can be approximated using tables (Appendix B.1), or approximated using the 68--95--99.7 rule (since t-scores are similar to z-scores; Sect. 28.4).
- When the sampling distribution for the statistic does not have an approximate normal distribution (e.g., for odds ratios and correlation coefficients), this formula does not apply and P-values are taken from software when available.
- A hypothesis test about odds ratios uses a χ2 test statistic. For 2×2 tables only, the χ2-value is equivalent to a z-score with a value of √χ2.
C.4 Sample size estimation
All the following formulas compute the approximate minimum (i.e., conservative) sample size needed to produce a 95% confidence interval with a specified margin of error (i.e., the 'give-or-take' amount).
To estimate the sample size needed for estimating a proportion (Sect. 32.3): n=1(Margin of error)2.
To estimate the sample size needed for estimating a mean (Sect. 32.4): n=(2×sMargin of error)2 for some estimate s of the standard deviation of the data.
To estimate the sample size needed for estimating a mean difference (Sect. 32.5): n=(2×sdMargin of error)2 for some estimate sd of the standard deviation of the differences.
-
To estimate the sample size needed for estimating the difference between two means (Sect. 32.6): n=2×(2×sMargin of error)2 for each group being compared, where s is an estimate of the common standard deviation in the population for both groups. This formula assumes:
- the sample size in each group is the same; and
- the standard deviation in each group is the same.
To estimate the sample size needed for estimating the difference between two proportions (Sect. 32.7): n=2(Margin of error)2 for each group being compared. This formula assumes the sample size in each group is the same.
Notes:
- In sample size calculations, round up the sample size found from the above formulas.
C.5 Other formulas
- To calculate z-scores (Sect. 20.4), use z=value of variable−mean of the distribution of the variablestandard deviation of the distribution of the variable. t-scores are like z-scores. When the 'variable' is a sample estimate (such as ˉx), the 'standard deviation of the distribution' is a standard error (such as s.e.(ˉx)).
- The unstandardising formula (Sect. 20.8) is x=μ+(z×σ).
- The interquartile range (IQR) is Q3−Q1, where Q1 and Q3 are the first and third quartiles respectively (or, equivalently, the 25th and 75th percentiles).
- The smallest expected value (for assessing statistical validity when forming CIs and conducting hypothesis tests with proportions or odds ratio) is (Smallest row total)×(Smallest column total)Overall total.
- The regression equation in the sample is ˆy=b0+b1x, where b0 is the sample intercept and b1 is the sample slope.
- The regression equation in the population is ˆy=β0+β1x, where β0 is the intercept and β1 is the slope.
C.6 Other symbols and abbreviations used
Symbol | Meaning | Reference |
---|---|---|
RQ | Research question | Chap. 2 |
s | Sample standard deviation | Sect. 11.7.2 |
σ | Population standard deviation | Sect. 11.7.2 |
sd | Sample standard deviation of differences | Sect. 11.7.2 |
σd | Population standard deviation of differences | Sect. 11.7.2 |
R2 | R-squared | Sect. 16.4.2 |
H0 | Null hypothesis | Sect. 28.2 |
H1 | Alternative hypothesis | Sect. 28.2 |
CI | Confidence interval | Chap. 24 |
s.e. | Standard error | Def. 19.4 |
n | Sample size | |
χ2 | The chi-squared test statistic | Sect. 31.6.3 |
± | Plus-or-minus (give-or-take) | Sect. 22.3 |