35.11 Summary
In this chapter, we have learnt about regression, which mathematically describes the relationship between two quantitative variables. The response variable is denoted by , and the explanatory variable by . The linear relationship between them (the regression equation), in the sample, is
where is a number (the intercept), is a number (the slope), and the ‘hat’ above the indicates that the equation gives an predicted mean value of for the given value.
The intercept is the predicted mean value of when the value of is zero. The slope is how much the predicted mean value of changes, on average, when the value of increases by 1.
The regression equation can be used to make predictions or to understand the relationship between the two variables. Predictions made with values of outside the values of used to create the regression equation (called extrapolation) may not be reliable.
In the population, the regression equation is
To test a hypothesis about a population slope , based on the value of the sample slope , assume the value of in the null hypothesis (usually zero) to be true. Then, the sample slope varies from sample to sample and, under certain statistical validity conditions, varies with an approximate normal distribution centered around the hypothesised value of , with a standard deviation of . This distribution describes what values of the sample slope could be expected in the sample if the value of in the null hypothesis was true. The test statistic is
where is the hypothesised value given in the null hypothesis (usually zero). The -value is like a -score, and so an approximate -value can be estimated using the 68–95–99.7 rule.
The following short video may help explain some of these concepts: