Examples

Air Temperature

The maximum and minimum daily air temperature was recorded at Paisley, Glasgow, over the last 50 years. These temperatures are displayed on the scatterplot below.

You can see that as the minimum temperature increases, the maximum temperature also increases. This suggests there could be a linear relationship between the maximum and minimum temperatures. Notice at this stage, we are only suggesting a linear relationship seems sensible, we are not saying that the relationship is definitely linear. Under the assumption that there could be a linear relationship between these two variables, we could draw a straight line through these data points that captures the nature of this relationship.

We can write out a statistical model to describe this relationship as follows.

\(\newline\)

Data: \((y_i,x_i), \quad i=1, \dots, n; \quad n=50.\)

\(y_i\) = maximum temperature in year \(i\) (vertical axis)

\(x_i\) = minimum temperature in year \(i\) (horizontal axis)

Possible model: \[y_i= \alpha + \beta x_i + \epsilon_i \quad \quad \mathrm{for} \, i=1, \dots, n\]

  • \(\alpha\) and \(\beta\) are the intercept and slope of the line, respectively.

  • \(\epsilon_i\) is an additive, unpredictable quantity.

  • \(y_i\) is the response, regarded as a random variable.

  • \(\alpha + \beta x_i\) is the deterministic part of the model,

  • \(\epsilon_i\) is the random part and \(\beta x_i\) is the part where the explanatory variable is incorporated.

In this example, \(\alpha\) and \(\beta\) are called model parameters. These are the quantities that we assume are unknown and we generally want to estimate them.

Please note that you can use whatever notation you want. This example provides one way of describing this model. We will try to use similar notation in most of the examples in this course.

Power

A study of an individual’s power (measured by a vertical jump and converted to power using the Lewis formula) and its relationship to their weight was undertaken by a sports scientist. A random sample of 38 users of the Stevenson Building facilities was selected and their power and weight measured.

You can see that as weight increases power also increases. This suggests there could be a linear relationship between weight and power and we could draw a straight line through these data points that captures the nature of this relationship.

\(\newline\)

Data: \((y_i,x_i), \quad i=1, \dots, n; \quad n=50.\)

\(y_i\) = power of student \(i\) (vertical axis)

\(x_i\) = weight of student \(i\) (horizontal axis)

Possible model: \[y_i= \alpha + \beta x_i + \epsilon_i \quad \quad \mathrm{for} \, i=1, \dots, n\]

  • \(\alpha + \beta x_i\) is the deterministic part of the model,

  • \(\epsilon_i\) is the random part and \(\beta x_i\) is the part where the explanatory variable is incorporated.

  • \(\alpha\) and \(\beta\) are model parameters to be estimated.

Alcohol consumed and blood alcohol content

In a study of alcohol consumption and related blood alcohol content, 16 student volunteers at Ohio State University drank a randomly assigned number of cans of beer. Thirty minutes later, a police officer measured their percent blood alcohol content (BAC). The researcher is interested in finding out if the number of cans of beers influences the BAC measurement.

\(\newline\)

Data: \((y_i,x_i), \quad i=1, \dots, n; \quad n=16.\)

\(y_i\) = BAC of student \(i\) (vertical axis)

\(x_i\) = Number of beers from student \(i\) (horizontal axis)

Possible model: \[y_i= \alpha + \beta x_i + \epsilon_i \quad \quad \mathrm{for} \, i=1, \dots, n.\]

Crime

Fifty states in America were investigated in terms of their crime rates and percentage of high school dropouts. The crime rate per 100,000 people included murder, forcible rape, robbery, aggravated assault, burglary, larceny-theft and motor vehicle theft and the state high school dropout rate comprised the percentage of current 16-19 year old people who were not in school and had not finished the 12th grade.

\(\newline\)

Data: \((y_i,x_i), \quad i=1, \dots, n; \quad n=50.\)

\(y_i\) = Crime per 100,000 people in state \(i\) (vertical axis)

\(x_i\) = High school dropout rate in state \(i\) (horizontal axis)

Possible model: \[y_i= \alpha + \beta x_i + \epsilon_i \quad \quad \mathrm{for} \, i=1, \dots, n.\]

Potatoes

The glucose level in potatoes is dependent on the length of time for which they have been stored. We have data detailing the glucose level in potatoes over time.

The scatterplot of glucose against storage time in weeks shows a possible quadratic curve would describe this relationship adequately.

\(\newline\)

Data: \((y_i,x_i), \quad i=1, \dots, n; \quad n=14.\)

\(y_i\) = Glucose level at week \(t\) (vertical axis)

\(x_i\) = time in weeks (horizontal axis)

Possible model: \[y_i= \alpha + \beta x_i + \gamma x_i^2 + \epsilon_i \quad \quad \mathrm{for} \, i=1, \dots, n.\]