1.1 Example: Linear Models

Consider the simple linear model. $\begin{equation} y = \beta_0 + \beta_1 x + \varepsilon \tag{1.1} \end{equation}$

This model has unknown parameters $\beta_0$ and $\beta_1$ . Given observations $(y_1, x_1), (y_2, x_2),\dots, (y_n, x_n)$ , we can combine these data with the likelihood principle, which gives us a procedure for producing model parameter estimates. The likelihood can be maximized to produce maximum likelihood estimates, $\hat{\beta}_0 = \bar{y} - \hat{\beta}_1\bar{x}$ and $\hat{\beta}_1 = \frac{\sum_{i=1}^n (x_i-\bar{x})(y_i-\bar{y})}{\sum_{i=1}^n (x_i-\bar{x})}$ These statistics, $\hat{\beta}_0$ and $\hat{\beta}_1$ , can then be interpreted, depending on the area of application, or used for other purposes, perhaps as inputs to other procedures. In this simple example, we can see how each component of the modeling process works.

Component	Implementation
Model	Linear regression
Principle/Technique	Likelihood principle
Algorithm	Maximization
Statistic	$\hat{\beta}_0$ , $\hat{\beta}_1$

In this example, the maximization of the likelihood was simple because the solution was available in closed form. However, in most other cases, there will not be a closed form solution and some specific algorithm will be needed to maximize the likelihood.

Changing the implementation of a given component can lead to different outcomes further down the change and can even produce completely different outputs. Identical estimates for the parameters in this model can be produced (in this case) by replacing the likelihood principle with the principle of least squares. However, changing the principle to produce, for example, maximum a posteriori estimates would have produced different statistics at the end.