35.11 Summary

In this chapter, we have learnt about regression, which mathematically describes the relationship between two quantitative variables. The response variable is denoted by $y$ , and the explanatory variable by $x$ . The linear relationship between them (the regression equation), in the sample, is

$\hat{y} = b_0 + b_1 x,$ where $b_0$ is a number (the intercept), $b_1$ is a number (the slope), and the ‘hat’ above the $y$ indicates that the equation gives an predicted mean value of $y$ for the given $x$ value.

The intercept is the predicted mean value of $y$ when the value of $x$ is zero. The slope is how much the predicted mean value of $y$ changes, on average, when the value of $x$ increases by 1.

The regression equation can be used to make predictions or to understand the relationship between the two variables. Predictions made with values of $x$ outside the values of $x$ used to create the regression equation (called extrapolation) may not be reliable.

In the population, the regression equation is

$\hat{y} = \beta_0 + \beta_1 x.$ To test a hypothesis about a population slope $\beta_1$ , based on the value of the sample slope $b_1$ , assume the value of $\beta_1$ in the null hypothesis (usually zero) to be true. Then, the sample slope varies from sample to sample and, under certain statistical validity conditions, varies with an approximate normal distribution centered around the hypothesised value of $\beta_1$ , with a standard deviation of $\text{s.e.}(b_1)$ . This distribution describes what values of the sample slope could be expected in the sample if the value of $\beta_1$ in the null hypothesis was true. The test statistic is

$t = \frac{ b_1 - \beta_1}{\text{s.e.}(b_1)},$ where $\beta_1$ is the hypothesised value given in the null hypothesis (usually zero). The $t$ -value is like a $z$ -score, and so an approximate $P$ -value can be estimated using the 68–95–99.7 rule.

The following short video may help explain some of these concepts: