3.1 Single population mean

  • Hypothesis testing about the population mean \(\mu\), whose value we do not know, includes 4 steps:
  1. Setting the null and alternative hypotheses
  2. Computing the test statistic
  3. Choosing a level of significance
  4. Making conclusion according to decision rule
  • The null hypothesis is a statement that is assumed to be true unless strong evidence to reject the null hypothesis exists

\[\begin{equation} H_0:~~\mu=\mu_0 \tag{3.1} \end{equation}\]

  • Assumed population mean \(\mu_0\) is the value that we think is true. If sample data shows that this is false, we reject the null hypothesis. Otherwise, we do not reject the null hypothesis.

  • We never accept the null hypothesis; instead, we fail to reject it if there is not enough evidence against it.

  • Prior to testing the null hypothesis, we must specify the alternative, which depends on what type of action is taken as a result of the hypothesis test

  • There are actually three ways to express the alternative hypothesis:

\[\begin{align} H_1:&~~\mu \ne \mu_0 &\text{two-tailed test} \\ \\ H_1:&~~\mu < \mu_0 &\text{left-tailed test} \\ \\ H_1:&~~\mu > \mu_0 &\text{right-tailed test} \\ \tag{3.2} \end{align}\]

  • A right-tailed test indicates that the actual population mean is greater than the assumed mean. A left-tailed test indicates that the actual population mean is less than the assumed population mean, while a two-tailed test indicates that the actual population mean is different than the assumed mean.

  • Although the null and alternative hypotheses represent opposite statements, \(H_0\) always has a symbol \("="\) as a mathematical convention, while \(H_1\) never has a symbol \("="\).

  • A test statistic is a numerical measure we construct to determine whether we should reject the null hypothesis or not

\[\begin{align} \frac{\bar{x}-\mu_0}{\sigma_{\bar{x}}}=\frac{\text{sample mean}-\text{assumed population mean}}{\text{standard error of the sample mean}} \tag{3.3} \end{align}\]

  • Test statistic (3.3) measures how far the sample mean is from the assumed population mean in terms of standard deviations from the sampling distribution

  • The form of the test statistic depends on three key details: normality assumption, the size of the sample and whether the population standard deviation is known

\[\begin{align} Z&=\frac{\bar{x}-\mu_0}{\frac{\sigma}{\sqrt{n}}} &\text{known "sigma"} \\ \\ t&=\frac{\bar{x}-\mu_0}{\frac{S}{\sqrt{n}}} &\text{unknown "sigma" replaced with }S \\ \tag{3.4} \end{align}\]

  • In practical applications \(\sigma\) is unknown just like parameter \(\mu\), and thus a test statistic follows a Student’s t-distribution with \((n-1)\) degrees of freedom \(t(df)\), but only if normality assumption holds in the population.

  • However, if \(\sigma\) is unknown and population is not normally distributed, a test statistic approximately follows a standard normal distribution \(Z\sim N(0,~1)\), but only if the sample size is large enough (\(n>30\)).

  • The most common decision rule is p-value. This approach considers calculating the probability of obtaining a test statistic at least as extreme as the observed test statistic (computed from the sample data), under the assumption that the null hypothesis is true

\[\begin{align} p-value&=2P(Z>|test~statistic||\mu=\mu_0) &\text{two-tailed test} \\ \\ p-value&=P(Z>|test~statistic||\mu=\mu_0) &\text{one-tailed test} \\ \\ p-value&=2P(t(df)>|test~statistic||\mu=\mu_0) &\text{two-tailed test} \\ \\ p-value&=P(t(df)>|test~statistic||\mu=\mu_0) &\text{one-tailed test} \\ \\ \tag{3.5} \end{align}\]

  • The p-value defines the smallest level of significance for which the null hypothesis can be rejected

  • The level of significance, denoted by \(\alpha\), defines the rejection area under the sampling distribution

  • The level of significance is selected by the researcher, and typically is set to \(5\%\) for small samples, and \(1\%\) for large samples

\[\begin{align} p-value&<\alpha & \text{reject }H_0 \\ \\ p-value& \geq \alpha & \text{do not reject }H_0 \\ \tag{2.3} \end{align}\]

  • For example, if the p-value of a test is \(0.038\), the null hypothesis cannot be rejected at \(\alpha=0.01\). However, the null hypothesis can be rejected at \(\alpha=0.05\)
Example 3.1 Jane has just begun her new job at a very competitive company. In a sample of \(16\) sales calls it was found that she closed the contract for an average value of \(108\) USD with a standard deviation of \(12\) USD. Company policy requires that new members of the sales department must exceed an average of \(100\) USD per contract during the trial employment period. Test at \(5\%\) significance that Jane has met company requirement. Obtain the p-value in Excel using function =NORM.S.DIST() or =T.DIST.RT().

\[\begin{align} H_0&:~\mu =100 \\ H_1&:~\mu > 100 \\ \\ t&=\frac{\bar{x}-\mu_0}{\frac{S}{\sqrt{n}}}=\frac{108-100}{\frac{12}{16}}=2.67 \\ \\ p&-value=P(t(15)>2.67)=0.008739 \end{align}\]

Example 3.2 Sample data of \(50\) companies in Excel file are available at this link. Test the hypothesis if the average number of employees in population is equal to sample median. Prior to hypothesis testing compute descriptive statistics of the number of employees. Level of significance is \(1\%\). Obtain the p-value in Excel using function =T.DIST.2T().