7.6 Prediction: Mean

Model = Mathematical equation(s)
Underlying a model is always a (joint) distribution
Model summarizes (joint) distribution with fewer parameters
- e.g. intercept/coefficents in linear model
Simple model: Mean of the distribution of a variable

$\bar{y} = \frac{y_{1}+y_{2}+\cdots +y_{n}}{n}=\frac{\sum_{i}^{n} trust2006_{i}}{n} = \frac{40668}{6633} = 6.13$

$y_{i} = \underbrace{\color{blue}{\overline{y}}}_{\color{green}{\widehat{y}}_{i}} \pm \color{red}{\varepsilon}_{i}$

$\begin{aligned} trust2006_{Anna} = 3 = \underbrace{\color{blue}{\overline{y}}}_{\color{green}{\widehat{y}}_{Anna}} \pm \color{red}{\varepsilon}_{Anna} = \color{blue}{6.13} - \color{red}{3.13} \end{aligned}$

Mean (= model) predicts Anna’s value with a certain error
Q: How well does the model (mean = 6.33) predict person’s that have values of 0, of 3 or of 4?
Important: We could use this model – this mean – to predict trust values of another group of people
- First train model (= calculate mean) on this data (training dataset), then use it to predict outcome in other data (validation dataset)
- Sometime this is called out of sample prediction