3.1 Single population mean
- Hypothesis testing about the population mean \(\mu\), whose value we do not know, includes 4 steps:
- Setting the null and alternative hypotheses
- Computing the test statistic
- Choosing a level of significance
- Making conclusion according to decision rule
- The null hypothesis is a statement that is assumed to be true unless strong evidence to reject the null hypothesis exists
\[\begin{equation} H_0:~~\mu=\mu_0 \tag{3.1} \end{equation}\]
Assumed population mean \(\mu_0\) is the value that we think is true. If sample data shows that this is false, we reject the null hypothesis. Otherwise, we do not reject the null hypothesis.
We never accept the null hypothesis; instead, we fail to reject it if there is not enough evidence against it.
Prior to testing the null hypothesis, we must specify the alternative, which depends on what type of action is taken as a result of the hypothesis test
There are actually three ways to express the alternative hypothesis:
\[\begin{align} H_1:&~~\mu \ne \mu_0 &\text{two-tailed test} \\ \\ H_1:&~~\mu < \mu_0 &\text{left-tailed test} \\ \\ H_1:&~~\mu > \mu_0 &\text{right-tailed test} \\ \tag{3.2} \end{align}\]
A right-tailed test indicates that the actual population mean is greater than the assumed mean. A left-tailed test indicates that the actual population mean is less than the assumed population mean, while a two-tailed test indicates that the actual population mean is different than the assumed mean.
Although the null and alternative hypotheses represent opposite statements, \(H_0\) always has a symbol \("="\) as a mathematical convention, while \(H_1\) never has a symbol \("="\).
A test statistic is a numerical measure we construct to determine whether we should reject the null hypothesis or not
\[\begin{align} \frac{\bar{x}-\mu_0}{\sigma_{\bar{x}}}=\frac{\text{sample mean}-\text{assumed population mean}}{\text{standard error of the sample mean}} \tag{3.3} \end{align}\]
Test statistic (3.3) measures how far the sample mean is from the assumed population mean in terms of standard deviations from the sampling distribution
The form of the test statistic depends on three key details: normality assumption, the size of the sample and whether the population standard deviation is known
\[\begin{align} Z&=\frac{\bar{x}-\mu_0}{\frac{\sigma}{\sqrt{n}}} &\text{known "sigma"} \\ \\ t&=\frac{\bar{x}-\mu_0}{\frac{S}{\sqrt{n}}} &\text{unknown "sigma" replaced with }S \\ \tag{3.4} \end{align}\]
In practical applications \(\sigma\) is unknown just like parameter \(\mu\), and thus a test statistic follows a Student’s t-distribution with \((n-1)\) degrees of freedom \(t(df)\), but only if normality assumption holds in the population.
However, if \(\sigma\) is unknown and population is not normally distributed, a test statistic approximately follows a standard normal distribution \(Z\sim N(0,~1)\), but only if the sample size is large enough (\(n>30\)).
The most common decision rule is p-value. This approach considers calculating the probability of obtaining a test statistic at least as extreme as the observed test statistic (computed from the sample data), under the assumption that the null hypothesis is true
\[\begin{align} p-value&=2P(Z>|test~statistic||\mu=\mu_0) &\text{two-tailed test} \\ \\ p-value&=P(Z>|test~statistic||\mu=\mu_0) &\text{one-tailed test} \\ \\ p-value&=2P(t(df)>|test~statistic||\mu=\mu_0) &\text{two-tailed test} \\ \\ p-value&=P(t(df)>|test~statistic||\mu=\mu_0) &\text{one-tailed test} \\ \\ \tag{3.5} \end{align}\]
The p-value defines the smallest level of significance for which the null hypothesis can be rejected
The level of significance, denoted by \(\alpha\), defines the rejection area under the sampling distribution
The level of significance is selected by the researcher, and typically is set to \(5\%\) for small samples, and \(1\%\) for large samples
\[\begin{align} p-value&<\alpha & \text{reject }H_0 \\ \\ p-value& \geq \alpha & \text{do not reject }H_0 \\ \tag{2.3} \end{align}\]
- For example, if the p-value of a test is \(0.038\), the null hypothesis cannot be rejected at \(\alpha=0.01\). However, the null hypothesis can be rejected at \(\alpha=0.05\)
=NORM.S.DIST()
or =T.DIST.RT()
.
\[\begin{align} H_0&:~\mu =100 \\ H_1&:~\mu > 100 \\ \\ t&=\frac{\bar{x}-\mu_0}{\frac{S}{\sqrt{n}}}=\frac{108-100}{\frac{12}{16}}=2.67 \\ \\ p&-value=P(t(15)>2.67)=0.008739 \end{align}\]
=T.DIST.2T()
.