7.6 Prediction: Mean

  • Model = Mathematical equation(s)
  • Underlying a model is always a (joint) distribution
  • Model summarizes (joint) distribution with fewer parameters
    • e.g. intercept/coefficents in linear model
  • Simple model: Mean of the distribution of a variable
\(\bar{y} = \frac{y_{1}+y_{2}+\cdots +y_{n}}{n}=\frac{\sum_{i}^{n} trust2006_{i}}{n} = \frac{40668}{6633} = 6.13\)



\(y_{i} = \underbrace{\color{blue}{\overline{y}}}_{\color{green}{\widehat{y}}_{i}} \pm \color{red}{\varepsilon}_{i}\)

\[ \begin{aligned} trust2006_{Anna} = 3 = \underbrace{\color{blue}{\overline{y}}}_{\color{green}{\widehat{y}}_{Anna}} \pm \color{red}{\varepsilon}_{Anna} = \color{blue}{6.13} - \color{red}{3.13} \end{aligned} \]

  • Mean (= model) predicts Anna’s value with a certain error
  • Q: How well does the model (mean = 6.33) predict person’s that have values of 0, of 3 or of 4?
  • Important: We could use this model – this mean – to predict trust values of another group of people
    • First train model (= calculate mean) on this data (training dataset), then use it to predict outcome in other data (validation dataset)
    • Sometime this is called out of sample prediction