35.11 Summary

In this chapter, we have learnt about regression, which mathematically describes the relationship between two quantitative variables. The response variable is denoted by y, and the explanatory variable by x. The linear relationship between them (the regression equation), in the sample, is

y^=b0+b1x, where b0 is a number (the intercept), b1 is a number (the slope), and the ‘hat’ above the y indicates that the equation gives an predicted mean value of y for the given x value.

The intercept is the predicted mean value of y when the value of x is zero. The slope is how much the predicted mean value of y changes, on average, when the value of x increases by 1.

The regression equation can be used to make predictions or to understand the relationship between the two variables. Predictions made with values of x outside the values of x used to create the regression equation (called extrapolation) may not be reliable.

In the population, the regression equation is

y^=β0+β1x. To test a hypothesis about a population slope β1, based on the value of the sample slope b1, assume the value of β1 in the null hypothesis (usually zero) to be true. Then, the sample slope varies from sample to sample and, under certain statistical validity conditions, varies with an approximate normal distribution centered around the hypothesised value of β1, with a standard deviation of s.e.(b1). This distribution describes what values of the sample slope could be expected in the sample if the value of β1 in the null hypothesis was true. The test statistic is

t=b1β1s.e.(b1), where β1 is the hypothesised value given in the null hypothesis (usually zero). The t-value is like a z-score, and so an approximate P-value can be estimated using the 68–95–99.7 rule.

The following short video may help explain some of these concepts: