35.10 Summary

In this chapter, we have learnt about regression, which mathematically describes the relationship between two quantitative variables. The response variable is denoted by yy, and the explanatory variable by xx. The linear relationship between them (the regression equation), in the sample, is

ˆy=b0+b1x,^y=b0+b1x, where b0b0 is a number (the intercept), b1b1 is a number (the slope), and the ‘hat’ above the yy indicates that the equation gives an predicted mean value of yy for the given xx value.

The intercept is the predicted mean value of yy when the value of xx is zero. The slope is how much the predicted mean value of yy changes, on average, when the value of xx increases by 1.

The regression equation can be used to make predictions or to understand the relationship between the two variables. Predictions made with values of xx outside the values of xx used to create the regression equation (called extrapolation) may not be reliable.

In the population, the regression equation is

ˆy=β0+β1x.^y=β0+β1x. To test a hypothesis about a population slope β1β1, based on the value of the sample slope b1b1, assume the value of β1β1 in the null hypothesis (usually zero) to be true. Then, the sample slope varies from sample to sample and, under certain statistical validity conditions, varies with an approximate normal distribution centered around the hypothesised value of β1β1, with a standard deviation of s.e.(b1)s.e.(b1). This distribution describes what values of the sample slope could be expected in the sample if the value of β1β1 in the null hypothesis was true. The test statistic is

t=b1β1s.e.(b1),t=b1β1s.e.(b1), where β1β1 is the hypothesised value given in the null hypothesis (usually zero). The t-value is like a z-score, and so an approximate P-value can be estimated using the 68–95–99.7 rule.

The following short video may help explain some of these concepts: