16.4 Estimation of SI Model

Consider the SI model (16.1) - (16.5). The asset specific parameters to be estimated are \(\alpha_{i}\), \(\beta_{i}\) and \(\sigma_{\epsilon,i}^{2}\), \((i=1,\ldots,N)\), and the market parameters to be estimated are \(\mu_{M}\) and \(\sigma_{M}^{2}\). These parameters can be estimated using the plug-in principle, linear regression, and maximum likelihood. All methods give essentially the same estimators for the SI model parameters.

16.4.1 Plug-in principle estimates

Let \(\{(r_{it},r_{Mt})\}_{t=1}^{T}\) denote a sample of size \(T\) of observed returns on asset \(i\) and the market return which are assumed to be generated from the SI model. (16.1) - (16.12). Recall, the plug-in principle says to estimate population model parameters using appropriate sample statistics. For the market parameters, the plug-in principle estimates are the same as the CER model estimates \[\begin{eqnarray*} \hat{\mu}_{M} & = & \frac{1}{T}\sum_{t=1}^{T}r_{Mt},\\ \hat{\sigma}_{M}^{2} & = & \frac{1}{T-1}\sum_{t=1}^{T}(r_{Mt}-\hat{\mu}_{M})^{2}. \end{eqnarray*}\] From (16.13) - (16.6) we see that \(\alpha_{i}\) and \(\beta_{i}\) are functions of population parameters \[\begin{eqnarray*} \alpha_{i} & = & \mu_{i}-\beta_{i}\mu_{M},\\ \beta_{i} & = & \frac{\mathrm{cov}(R_{it},R_{Mt})}{\mathrm{var}(R_{Mt})}=\frac{\sigma_{iM}}{\sigma_{M}^{2}}. \end{eqnarray*}\] The corresponding plug-in principle estimates are then: \[\begin{eqnarray} \hat{\alpha}_{i} & = & \hat{\mu}_{i}-\hat{\beta}_{i}\hat{\mu}_{M},\tag{16.21}\\ \hat{\beta}_{i} & = & \frac{\hat{\sigma}_{iM}}{\hat{\sigma}_{M}^{2}},\tag{16.22} \end{eqnarray}\] where \[\begin{eqnarray*} \hat{\mu}_{i} & = & \frac{1}{T}\sum_{t=1}^{T}r_{it},\\ \hat{\sigma}_{iM} & = & \frac{1}{T-1}\sum_{t=1}^{T}(r_{it}-\hat{\mu}_{i})(r_{Mt}-\hat{\mu}_{M}). \end{eqnarray*}\]

Given the plug-in principle estimates \(\hat{\alpha}_{i}\) and \(\hat{\beta}_{i}\), the plug-in principle estimate of \(\epsilon_{it}\) is \[\begin{equation} \hat{\varepsilon}_{it}=r_{it}-\hat{\alpha}_{i}-\hat{\beta}_{i}r_{Mt},\,t=1,\ldots,T.\tag{16.23} \end{equation}\] Using (16.23), the plug-in principle estimate of \(\sigma_{\epsilon,i}^{2}\) is the sample variance of \(\{\hat{\epsilon}_{it}\}_{t=1}^{T}\) (adjusted for the number of degrees of freedom): \[\begin{align} \hat{\sigma}_{\varepsilon,i}^{2} & =\frac{1}{T-2}\sum_{t=1}^{T}\hat{\varepsilon}_{t}^{2}=\frac{1}{T-2}\sum_{t=1}^{T}\left(r_{it}-\hat{\alpha}_{i}-\hat{\beta}_{i}r_{Mt}\right)^{2}.\tag{16.24} \end{align}\]

Plug-in principle estimates of \(R^{2}\) based on (16.15) can be computed using \[\begin{align} \hat{R}^{2} & =\frac{\hat{\beta}_{i}^{2}\hat{\sigma}_{M}^{2}}{\hat{\sigma}_{i}^{2}}=1-\frac{\hat{\sigma}_{\varepsilon,i}^{2}}{\hat{\sigma}_{i}^{2}}.\tag{16.25} \end{align}\]

Example 3.2 (Computing plug-in principle estimators for SI model parameters)

Consider computing the plug-in principle estimates for \(\alpha_{i},\) \(\beta_{i}\) and \(\sigma_{\epsilon,i}^{2}\) from the example data using the formulas (16.21), (16.22) and (16.24), respectively. First, extract the sample statistics \(\hat{\mu}_{i}\), \(\hat{\sigma}_{iM}\), \(\hat{\mu}_{M}\), and \(\hat{\sigma}_{M}^{2}\):

assetNames = colnames(siRetS)[1:4] 
muhat = colMeans(siRetS) 
sig2hat = diag(covmatHat)
covAssetsSp500 = covmatHat[assetNames, "SP500"]

Next, estimate \(\hat{\beta}_{i}\) using

betaHat = covAssetsSp500/sig2hat["SP500"]
betaHat
##    BA   JWN  MSFT  SBUX 
## 0.978 1.485 1.303 1.057

Here, we see that \(\hat{\beta}_{BA}\) and \(\hat{\beta}_{SBUX}\) are very close to one and that \(\hat{\beta}_{JWN}\) and \(\hat{\beta}_{MSFT}\) are slightly bigger than one. Using the estimates of \(\hat{\beta}_{i}\) and the sample statistics \(\hat{\mu}_{i}\) and \(\hat{\mu}_{M}\) the estimates for \(\hat{\alpha}_{i}\) are

alphaHat = muhat[assetNames] - betaHat*muhat["SP500"] 
alphaHat
##      BA     JWN    MSFT    SBUX 
## 0.00516 0.01231 0.00544 0.01785

All of the estimates of \(\hat{\alpha}_{i}\) are close to zero. The estimates of \(\sigma_{\epsilon,i}^{2}\) can be computed using:

sig2eHat = rep(0, length(assetNames)) 
names(sig2eHat) = assetNames 
for (aName in assetNames) {   
     eHat = siRetS[, aName] - alphaHat[aName] - betaHat[aName]*siRetS[, "SP500"]   
     sig2eHat[aName] = crossprod(eHat)/(length(eHat) - 2) 
} 
sig2eHat
##      BA     JWN    MSFT    SBUX 
## 0.00581 0.00994 0.00646 0.00941

Lastly, the estimates of \(R^{2}\) can be computed using

R2 = 1 - sig2eHat/sig2hat[assetNames] 
R2
##    BA   JWN  MSFT  SBUX 
## 0.270 0.334 0.373 0.210

\(\blacksquare\)

16.4.2 Least squares estimates

The SI model representation (16.1) shows that returns are a linear function of the market return and an asset specific error term \[ R_{it}=\alpha_{i}+\beta_{i}R_{Mt}+\epsilon_{it}, \] here \(\alpha_{i}\) is the intercept and \(\beta_{i}\) is the slope. Least squares regression is a method for estimating \(\alpha_{i}\) and \(\beta_{i}\) by finding the “best fitting” line to the scatterplot of returns where \(R_{it}\) is on the vertical axis and \(R_{Mt}\) is on the horizontal axis.

{[}Insert Figure here{]} To be completed…

To see how the method of least squares determines the “best fitting” line, consider the scatterplot of the sample returns on Boeing and the S&P 500 index illustrated in Figure xxx. In the figure, the black line is a fitted line with initial guess \(\hat{\alpha}_{BA}=0\) and \(\hat{\beta}_{BA}=0.5.\) The differences between the observed returns (blue dots) and the values on the fitted line are the estimated errors \(\hat{\epsilon}_{BA,t}=r_{BA,t}-\hat{\alpha}_{BA}-\hat{\beta}_{BA}r_{Mt}=r_{BA,t}-0-0.5\times R_{Mt}.\) Some estimated errors are big and some are small. The overall fit of the line can be measured using a statistic based on all \(t=1,\ldots,T\) of the estimated errors. A natural choice is the sum of the errors \(\sum_{t=1}^{T}\hat{\epsilon}_{t}\). However, this choice can be misleading due to the canceling out of large positive and negative errors. To avoid this problem, it is better to measure the overall fit using \(\sum_{t=1}^{T}\hat{|\epsilon}_{t}|\) or \(\sum_{t=1}^{T}\hat{\epsilon}_{t}^{2}.\) Then the best fitting line can be determined by finding the intercept and slope values that minimize \(\sum_{t=1}^{T}\hat{|\epsilon}_{t}|\) or \(\sum_{t=1}^{T}\hat{\epsilon}_{t}^{2}.\)

The method of least squares regression defines the “best fitting” line by finding the intercept and slope values that minimize the sum of squared errors \[\begin{equation} \mathrm{SSE}(\hat{\alpha}_{i},\hat{\beta}_{i})=\sum_{t=1}^{T}\hat{\epsilon}_{it}^{2}=\sum_{t=1}^{T}\left(r_{it}-\hat{\alpha}_{i}-\hat{\beta}_{i}r_{Mt}\right)^{2}.\tag{16.26} \end{equation}\] Because \(\mathrm{SSE}(\hat{\alpha},\hat{\beta})\) is a continuous and differential function of \(\hat{\alpha}_{i}\) and \(\hat{\beta}_{i}\), the minimizing values of \(\hat{\alpha}_{i}\) and \(\hat{\beta}_{i}\) can be determined using simple calculus. The first order conditions for a minimum are: \[\begin{align} 0 & =\frac{\partial\mathrm{SSE}(\hat{\alpha}_{i},\hat{\beta}_{i})}{\partial\hat{\alpha}_{i}}=-2\sum_{t=1}^{T}(r_{it}-\hat{\alpha}_{i}-\hat{\beta}_{i}r_{Mt})=-2\sum_{t=1}^{T}\hat{\varepsilon}_{it},\tag{16.27}\\ 0 & =\frac{\partial\mathrm{SSE}(\hat{\alpha}_{i},\hat{\beta}_{i})}{\partial\hat{\beta}_{i}}=-2\sum_{t=1}^{T}(r_{it}-\hat{\alpha}_{i}-\hat{\beta}_{i}r_{Mt})r_{Mt}=-2\sum_{t=1}^{T}\hat{\varepsilon}_{it}r_{Mt}.\tag{16.28} \end{align}\] These are two linear equations in two unknowns which can be re-expressed as \[\begin{eqnarray*} \hat{\alpha}_{i}T+\hat{\beta}_{i}\sum_{t=1}^{T}r_{Mt} & = & \sum_{t=1}^{T}r_{it},\\ \hat{\alpha}_{i}\sum_{t=1}^{T}r_{Mt}+\hat{\beta}_{i}\sum_{t=1}^{T}r_{Mt}^{2} & = & \sum_{t=1}^{T}r_{it}r_{Mt}. \end{eqnarray*}\] Using matrix algebra, we can write these equations as: \[\begin{equation} \left(\begin{array}{cc} T & \sum_{t=1}^{T}r_{Mt}\\ \sum_{t=1}^{T}r_{Mt} & \sum_{t=1}^{T}r_{Mt}^{2} \end{array}\right)\left(\begin{array}{c} \hat{\alpha}_{i}\\ \hat{\beta}_{i} \end{array}\right)=\left(\begin{array}{c} \sum_{t=1}^{T}r_{it}\\ \sum_{t=1}^{T}r_{it}r_{Mt} \end{array}\right),\tag{16.29} \end{equation}\] which is of the form \(\mathbf{Ax}=\mathbf{b}\) with \[ \mathbf{A}=\left(\begin{array}{cc} T & \sum_{t=1}^{T}r_{Mt}\\ \sum_{t=1}^{T}r_{Mt} & \sum_{t=1}^{T}r_{Mt}^{2} \end{array}\right),\,\mathbf{x}=\left(\begin{array}{c} \hat{\alpha}_{i}\\ \hat{\beta}_{i} \end{array}\right),\,\mathbf{b}=\left(\begin{array}{c} \sum_{t=1}^{T}r_{it}\\ \sum_{t=1}^{T}r_{it}r_{Mt} \end{array}\right). \] Hence, we can determine \(\hat{\alpha}_{i}\) and \(\hat{\beta}_{i}\) by solving \(\mathbf{x}=\mathbf{A}^{-1}\mathbf{b}.\) Now,98 \[\begin{eqnarray*} \mathbf{A}^{-1} & = & \frac{1}{\mathrm{det}(\mathbf{A})}\left(\begin{array}{cc} \sum_{t=1}^{T}r_{Mt}^{2} & -\sum_{t=1}^{T}r_{Mt}\\ -\sum_{t=1}^{T}r_{Mt} & T \end{array}\right),\\ \mathrm{det}(\mathbf{A}) & = & T\sum_{t=1}^{T}r_{Mt}^{2}-\left(\sum_{t=1}^{T}r_{Mt}\right)^{2}=T\sum_{t=1}^{T}\left(r_{Mt}-\hat{\mu}_{M}\right)^{2},\\ \hat{\mu}_{M} & = & \frac{1}{T}\sum_{t=1}^{T}r_{Mt}. \end{eqnarray*}\] Consequently, \[\begin{equation} \left(\begin{array}{c} \hat{\alpha}_{i}\\ \hat{\beta}_{i} \end{array}\right)=\frac{1}{T\sum_{t=1}^{T}\left(r_{Mt}-\hat{\mu}_{M}\right)^{2}}\left(\begin{array}{cc} \sum_{t=1}^{T}r_{Mt}^{2} & -\sum_{t=1}^{T}r_{Mt}\\ -\sum_{t=1}^{T}r_{Mt} & T \end{array}\right)\left(\begin{array}{c} \sum_{t=1}^{T}r_{it}\\ \sum_{t=1}^{T}r_{it}r_{Mt} \end{array}\right)\tag{16.30} \end{equation}\] and so \[\begin{eqnarray} \hat{\alpha}_{i} & = & \frac{\sum_{t=1}^{T}r_{Mt}^{2}\sum_{t=1}^{T}r_{it}-\sum_{t=1}^{T}r_{Mt}^{2}\sum_{t=1}^{T}r_{it}r_{Mt}}{T\sum_{t=1}^{T}\left(r_{Mt}-\hat{\mu}_{M}\right)^{2}},\tag{16.31}\\ \hat{\beta}_{i} & = & \frac{T\sum_{t=1}^{T}r_{it}r_{Mt}-\sum_{t=1}^{T}r_{Mt}\sum_{t=1}^{T}r_{it}}{T\sum_{t=1}^{T}\left(r_{Mt}-\hat{\mu}_{M}\right)^{2}}.\tag{16.32} \end{eqnarray}\] After a little bit of algebra (see end-of-chapter exercises) it can be shown that \[\begin{eqnarray*} \hat{\alpha}_{i} & = & \hat{\mu}_{i}-\hat{\beta}_{i}\hat{\mu}_{M},\\ \hat{\beta}_{i} & = & \frac{\hat{\sigma}_{iM}}{\hat{\sigma}_{M}^{2}}, \end{eqnarray*}\] which are plug-in estimates for \(\hat{\alpha}_{i}\) and \(\hat{\beta}_{i}\) determined earlier. Hence, the least squares estimates of \(\hat{\alpha}_{i}\) and \(\hat{\beta}_{i}\) are identical to the plug-in estimates.

The solution for the least squares estimates in (16.30) has an elegant representation using matrix algebra. To see this, define the \(T\times1\) vectors \(\mathbf{r}_{i}=(r_{i1},\ldots,r_{iT})^{\prime}\), \(\mathbf{r}_{M}=(r_{M1},\ldots,r_{MT})^{\prime}\) and \(\mathbf{1}=(1,\ldots,1)^{\prime}\). Then we can re-write (16.29) as \[ \left(\begin{array}{cc} \mathbf{1}^{\prime}\mathbf{1} & \mathbf{1}^{\prime}\mathbf{r}_{M}\\ \mathbf{1}^{\prime}\mathbf{r}_{M} & \mathbf{r}_{M}^{\prime}\mathbf{r}_{M} \end{array}\right)\left(\begin{array}{c} \hat{\alpha}_{i}\\ \hat{\beta}_{i} \end{array}\right)=\left(\begin{array}{c} \mathbf{1}^{\prime}\mathbf{r}_{i}\\ \mathbf{r}_{M}^{\prime}\mathbf{r}_{i} \end{array}\right) \] or \[\begin{equation} \mathbf{X}^{\prime}\mathbf{X}\hat{\gamma}_{i}=\mathbf{X}^{\prime}\mathbf{r}_{i}\tag{16.33} \end{equation}\] where \(\mathbf{X}=(\begin{array}{cc} \mathbf{1} & \mathbf{r}_{M})\end{array}\)is a \(T\times2\) matrix and \(\hat{\gamma}=(\hat{\alpha}_{i},\hat{\beta}_{i})^{\prime}\). Provided \(\mathbf{X}^{\prime}\mathbf{X}\) is invertible, solving (16.33) for \(\hat{\gamma}_{i}\) gives the least squares estimates in matrix form: \[\begin{equation} \hat{\gamma}_{i}=\left(\mathbf{X}^{\prime}\mathbf{X}\right)^{-1}\mathbf{X}^{\prime}\mathbf{r}_{i}.\tag{16.34} \end{equation}\] The matrix form solution (16.34) is especially convenient for computation in R.

The least squares estimates of \(\epsilon_{t}\), \(\sigma_{\epsilon,i}^{2}\) and \(R^{2}\) are the same as the plug-in estimators (16.23), (16.24) and (16.25), respectively. In the context of least squares estimation, the estimate \(\hat{\sigma}_{\epsilon,i}=\sqrt{\hat{\sigma}_{\epsilon,i}^{2}}\) is called the standard error of the regression and measures the typical magnitude of \(\hat{\epsilon}_{t}\) (difference between observed return and fitted regression line).

16.4.3 Simple linear regression in R

  • don’t do regression examples until statistical theory is discussed
  • computing least squares estimates using matrix algebra formulas
  • computing least squares estimates using lm()
    • See discussion from my regression chapter in MFTSR
    • describe structure of lm() function, extractor and method functions
  • Do analysis of example data To be completed…

16.4.4 Maximum likelihood estimates

The SI model parameters can also be estimated using the method of maximum likelihood, which was introduced in chapter (GARCH estimation chapter). To construct the likelihood function, we use property () of the SI model that conditional on \(R_{Mt}=r_{Mt}\) the distribution of \(R_{it}\) is normal with mean \(\alpha_{i}+\beta_{i}r_{Mt}\) and variance \(\sigma_{\epsilon,i}^{2}\). The pdf of \(R_{it}|R_{Mt}=r_{mt}\) is then \[ f(r_{it}|r_{mt},\theta_{i})=(2\pi\sigma_{\varepsilon,i}^{2})^{-1/2}\exp\left(\frac{-1}{2\sigma_{\varepsilon,i}^{2}}\left(r_{it}-\alpha_{i}+\beta_{i}r_{Mt}\right)^{2}\right),\,t=1,\ldots,T, \] where \(\theta_{i}=(\alpha_{i},\beta_{i},\sigma_{\epsilon,i}^{2})^{\prime}\). Given a sample \(\{(r_{it},r_{Mt})\}_{t=1}^{T}=\{\mathbf{r}_{i},\mathbf{r}_{M}\}\) of observed returns on asset \(i\) and the market return, which are assumed to be generated from the SI model, the joint density of asset returns given the market returns is \[\begin{eqnarray*} f(\mathbf{r}_{i}|\mathbf{r}_{m}) & = & \prod_{t=1}^{T}(2\pi\sigma_{\varepsilon,i}^{2})^{-1/2}\exp\left(\frac{-1}{2\sigma_{\varepsilon,i}^{2}}\left(r_{it}-\alpha_{i}+\beta_{i}r_{Mt}\right)^{2}\right)\\ & = & (2\pi\sigma_{\varepsilon,i}^{2})^{-T/2}\exp\left(\frac{-1}{2\sigma_{\varepsilon,i}^{2}}\sum_{t=1}^{T}\left(r_{it}-\alpha_{i}+\beta_{i}r_{Mt}\right)^{2}\right)\\ & = & (2\pi\sigma_{\varepsilon,i}^{2})^{-T/2}\exp\left(\frac{-1}{2\sigma_{\varepsilon,i}^{2}}\mathrm{SSE}(\alpha_{i},\beta_{i})\right). \end{eqnarray*}\] where \(\mathrm{SSE}(\alpha_{i},\beta_{i})\) is the sum of squared residuals (16.26) used to determine the least squares estimates. The log-likelihood function for \(\theta_{i}\) is then \[\begin{eqnarray} lnL(\theta_{i}|\mathbf{r}_{i},\mathbf{r}_{M}) & = & \frac{-T}{2}\ln(2\pi)-\frac{T}{2}\ln(\sigma_{\varepsilon,i}^{2})-\frac{1}{2\sigma_{\varepsilon,i}^{2}}\mathrm{SSE}(\alpha_{i},\beta_{i}).\tag{16.35} \end{eqnarray}\] From (16.35), it can be seen that the values of \(\alpha_{i}\) and \(\beta_{i}\) that maximize the log-likelihood are the values that minimize \(\mathrm{SSE}(\alpha_{i},\beta_{i})\). Hence, the ML estimates of \(\alpha_{i}\) and \(\beta_{i}\) are the least squares estimates.

To find the ML estimate for \(\sigma_{\epsilon,i}^{2}\), plug the ML estimates of \(\alpha_{i}\) and \(\beta_{i}\) into (16.35) giving \[ lnL(\hat{\alpha}_{i},\hat{\beta}_{i},\sigma_{\epsilon,i}^{2}|\mathbf{r}_{i},\mathbf{r}_{M})=\frac{-T}{2}\ln(2\pi)-\frac{T}{2}\ln(\sigma_{\varepsilon,i}^{2})-\frac{1}{2\sigma_{\varepsilon,i}^{2}}\mathrm{SSE}(\hat{\alpha}_{i},\hat{\beta}_{i}). \] Maximization with respect to \(\sigma_{\epsilon,i}^{2}\) gives the first order condition \[ \frac{\partial lnL(\hat{\alpha}_{i},\hat{\beta}_{i},\sigma_{\epsilon,i}^{2}|\mathbf{r}_{i},\mathbf{r}_{M})}{\partial\sigma_{\epsilon,i}^{2}}=-\frac{T}{2\hat{\sigma}_{\epsilon,i}^{2}}+\frac{1}{2\left(\hat{\sigma}_{\epsilon,i}^{2}\right)^{2}}\mathrm{SSE}(\hat{\alpha}_{i},\hat{\beta}_{i})=0. \] Solving for \(\hat{\sigma}_{\epsilon,i}^{2}\) gives the ML estimate for \(\sigma_{\epsilon,i}^{2}\): \[ \hat{\sigma}_{\epsilon,i}^{2}=\frac{\mathrm{SSE}(\hat{\alpha}_{i},\hat{\beta}_{i})}{T}=\frac{1}{T}\sum_{t=1}^{T}\hat{\epsilon}_{t}^{2}, \] which is plug-in principle estimate (16.24) not adjusted for degrees-of-freedom.


  1. The matrix \(\mathbf{A}\) is invertible provided \(\mathrm{det}(A)\neq0.\) This requires the sample variance of \(R_{Mt}\) to be non-zero. ↩︎