C Symbols, formulas, statistics and parameters
C.1 Symbols and standard errors
The following table lists the statistics used to estimate unknown population parameters.
The sampling distribution is given for each statistic.
When the sampling distribution is approximately normally distributed, under appropriate statistical validity conditions, this is indicated by .

The value of the mean of the sampling distribution (the sampling mean) is:
 unknown, for confidence intervals.
 assumed to be the value given in the null hypothesis, for hypothesis tests.
Statistic  sampling mean  distn?  error  Ref.  

Proportion  \(\hat{p}\)  \(p\)  ✔  CI: \(\displaystyle \sqrt{\frac{ \hat{p} \times (1  \hat{p})}{n}}\)  CI: Ch. 23 
✔  HT: \(\displaystyle \sqrt{\frac{ p \times (1  p)}{n}}\)  HT: Ch. 30  
Mean  \(\bar{x}\)  \(\mu\)  ✔  \(\displaystyle \frac{s}{\sqrt{n}}\)  CI: Ch. 24 
✔  \(\displaystyle \frac{s}{\sqrt{n}}\)  HT: Ch. 31  
Mean difference  \(\bar{d}\)  \(\mu_d\)  ✔  \(\displaystyle \frac{s_d}{\sqrt{n}}\)  CI: Ch. 26 
✔  \(\displaystyle \frac{s_d}{\sqrt{n}}\)  HT: Ch. 33  
Difference between means  \(\bar{x}_1  \bar{x}_2\)  \(\mu_1  \mu_2\)  ✔  \(\displaystyle \sqrt{\text{s.e.}(\bar{x}_1) + \text{s.e.}(\bar{x}_2)}\)  CI: Ch. 27 
✔  \(\displaystyle \sqrt{\text{s.e.}(\bar{x}_1) + \text{s.e.}(\bar{x}_2)}\)  HT: Ch. 34  
Diff. between proportions  \(\hat{p}_1  \hat{p}_2\)  \(p_1  p_2\)  ✔  CI: \(\displaystyle \sqrt{\text{s.e.}(\hat{p}_1) + \text{s.e.}(\hat{p}_2)}\)  CI: Ch. 28 
✔  HT: \(\displaystyle \sqrt{\text{s.e.}(\hat{p}) + \text{s.e.}(\hat{p})}\) for common proportion \(\hat{p}\)  HT: Ch. 35  
Odds ratio  Sample OR  Pop. OR  ✘  (Not given)  CI: Ch. 28 
✘  (Not given)  HT: Ch. 35  
Correlation  \(r\)  \(\rho\)  ✘  (Not given)  HT: Ch. 37 
Regression: slope  \(b_1\)  \(\beta_1\)  ✔  \(\text{s.e.}(b_1)\) (value from software)  CI: Ch. 38 
✔  \(\text{s.e.}(b_1)\) (value from software)  HT: Ch. 38  
Regression: intercept  \(b_0\)  \(\beta_0\)  ✔  \(\text{s.e.}(b_0)\) (value from software)  CI: Ch. 38 
✔  \(\text{s.e.}(b_0)\) (value from software)  HT: Ch. 38 
C.2 Hypothesis testing
For statistics that have an approximate normal distribution, the test statistic has the form: \[ \text{test statistic} = \frac{\text{statistic}  \text{parameter}}{\text{s.e.}(\text{statistic})}. \] The teststatistic is a \(t\)score for most hypothesis tests in this book, but is a \(z\)score for a hypothesis test involving one or two proportions.
Notes:
 If the teststatistic is a \(z\)score, the \(P\)value can be found using tables (Appendix B.1), or approximated using the \(68\)\(95\)\(99.7\) rule.
 If the teststatistic is a \(t\)scores, \(P\)value can be approximated using (Appendix B.1), or approximated using the \(68\)\(95\)\(99.7\) rule (since \(t\)scores are similar to \(z\)scores; Sect. 32.4).
 When the sampling distribution for the statistic does not have an approximate normal distribution (e.g., for odds ratios and correlation coefficients), this formula does not apply.
 A hypothesis test about odds ratios uses a \(\chi^2\) test statistic. For \(2\times 2\) tables only, the \(\chi^2\) value is equivalent to a \(z\)score with a value of \(\sqrt{\chi^2}\).
C.3 Confidence intervals
For statistics with an approximate normal distribution, confidence intervals have the form \[ \text{statistic} \pm ( \text{multiplier} \times \text{s.e.}(\text{statistic})). \]
Notes:
 The multiplier is approximately \(2\) to create an approximate \(95\)% CI (based on the \(68\)\(95\)\(99.7\) rule).
 The quantity '\(\text{multiplier} \times \text{s.e.}(\text{statistic})\)' is called the margin of error.
 When the sampling distribution for the statistic does not have an approximate normal distribution (e.g., for odds ratios and correlation coefficients), this formula does not apply and the CIs are taken from software output when available.
C.4 Sample size estimation
All of the following formulas compute the approximate minimum (i.e., conservative) sample size needed to produce a \(95\)% confidence interval with a specified margin of error.
To estimate the sample size needed for estimating a proportion (Sect. 29.3): \[ n = \frac{1}{(\text{Margin of error})^2}. \]
To estimate the sample size needed for estimating a mean (Sect. 29.4): \[ n = \left( \frac{2\times s}{\text{Margin of error}}\right)^2. \]
To estimate the sample size needed for estimating a mean difference (Sect. 29.5): \[ n = \left( \frac{2 \times s_d}{\text{Margin of error}}\right)^2. \]

To estimate the sample size needed for estimating the difference between two means (Sect. 29.6): \[ n = 2\times \left( \frac{2 \times s}{\text{Margin of error}}\right)^2 \] for each sample, where \(s\) is an estimate of the common standard deviation in the population for both groups. This formula assumes:
 the sample size in each group is the same; and
 the standard deviation in each group is the same.
Notes:
 In sample size calculations, round up the sample size found from the above formulas.
C.5 Other formulas
 To calculate \(z\)scores (Sect. 21.4), use \[ z = \frac{\text{value of variable}  \text{mean of the distribution of the variable}}{\text{standard deviation of the distribution of the variable}}. \] \(t\)scores are like \(z\)scores. When the 'variable' is a sample estimate (such as \(\bar{x}\)), the 'standard deviation of the distribution' is a standard error (such as \(\text{s.e.}(\bar{x})\)).
 The unstandardizing formula (Sect. 21.8) is \(x = \mu + (z\times \sigma)\).
 The interquartile range (IQR) is \(Q_3  Q_1\), where \(Q_1\) and \(Q_3\) are the first and third quartiles respectively (or, equivalently, the \(25\)th and \(75\)th percentiles).
 The smallest expected value (for assessing statistical validity when forming CIs and conducting hypothesis tests with proportions or odds ratio) is \[ \frac{(\text{Smallest row total})\times(\text{Smallest column total})}{\text{Overall total}}. \]
 The regression equation in the sample is \(\hat{y} = b_0 + b_1 x\), where \(b_0\) is the sample intercept and \(b_1\) is the sample slope.
 The regression equation in the population is \(\hat{y} = \beta_0 + \beta_1 x\), where \(\beta_0\) is the intercept and \(\beta_1\) is the slope.
C.6 Other symbols and abbreviations used
Symbol  Meaning  Reference 

RQ  Research question  Chap. 2 
\(s\)  Sample standard deviation  Sect. 11.7.2 
\(\sigma\)  Population standard deviation  Sect. 11.7.2 
\(s_d\)  Sample standard deviation of differences  Sect. 11.7.2 
\(\sigma_d\)  Population standard deviation of differences  Sect. 11.7.2 
\(R^2\)  Rsquared  Sect. 16.4.2 
\(H_0\)  Null hypothesis  Sect. 32.2 
\(H_1\)  Alternative hypothesis  Sect. 32.2 
CI  Confidence interval  Chap. 25 
s.e.  Standard error  Def. 20.3 
\(n\)  Sample size  
\(\chi^2\)  The chisquared test statistic  Sect. 35.3.2 