5.2 Feasible Generalized Least Squares

Motivation for a more efficient estimator

Gauss-Markov Theorem holds under A1-A4
A4: $Var(\epsilon| \mathbf{X} )=\sigma^2I_n$
- Heteroskedasticity: $Var(\epsilon_i|\mathbf{X}) \neq \sigma^2I_n$
- Serial Correlation: $Cov(\epsilon_i,\epsilon_j|\mathbf{X}) \neq 0$
Without A4, how can we know which unbiased estimator is the most efficient?

Original (unweighted) model:

\[ \mathbf{y=X\beta+ \epsilon} \]

Suppose A1-A3 hold, but A4 does not hold,

\[ \mathbf{Var(\epsilon|X)=\Omega \neq \sigma^2 I_n} \]

We will try to use OLS to estimate the transformed (weighted) model

\[ \mathbf{wy=wX\beta + w\epsilon} \]

We need to choose $\mathbf{w}$ so that

\[ \mathbf{w'w = \Omega^{-1}} \]

then $\mathbf{w}$ (full-rank matrix) is the Cholesky decomposition of $\mathbf{\Omega^{-1}}$ (full-rank matrix)

In other words, $\mathbf{w}$ is the squared root of $\Omega$ (squared root version in matrix)

\[ \begin{aligned} \Omega &= var(\epsilon | X) \\ \Omega^{-1} &= var(\epsilon | X)^{-1} \end{aligned} \]

Then, the transformed equation (IGLS) will have the following properties.

\[ \begin{aligned} \mathbf{\hat{\beta}_{IGLS}} &= \mathbf{(X'w'wX)^{-1}X'w'wy} \\ & = \mathbf{(X'\Omega^{-1}X)^{-1}X'\Omega^{-1}y} \\ & = \mathbf{\beta + X'\Omega^{-1}X'\Omega^{-1}\epsilon} \end{aligned} \]

Since A1-A3 hold for the unweighted model

\[ \begin{aligned} \mathbf{E(\hat{\beta}_{IGLS}|X)} & = E(\mathbf{\beta + (X'\Omega^{-1}X'\Omega^{-1}\epsilon)}|X)\\ & = \mathbf{\beta + E(X'\Omega^{-1}X'\Omega^{-1}\epsilon)|X)} \\ & = \mathbf{\beta + X'\Omega^{-1}X'\Omega^{-1}E(\epsilon|X)} && \text{since A3}: E(\epsilon|X)=0 \\ & = \mathbf{\beta} \end{aligned} \]

$\rightarrow$ IGLS estimator is unbiased

\[ \begin{aligned} \mathbf{Var(w\epsilon|X)} &= \mathbf{wVar(\epsilon|X)w'} \\ & = \mathbf{w\Omega w'} \\ & = \mathbf{w(w'w)^{-1}w'} && \text{since w is a full-rank matrix}\\ & = \mathbf{ww^{-1}(w')^{-1}w'} \\ & = \mathbf{I_n} \end{aligned} \]

$\rightarrow$ A4 holds for the transformed (weighted) equation

Then, the variance for the estimator is

\[ \begin{aligned} Var(\hat{\beta}_{IGLS}|\mathbf{X}) & = \mathbf{Var(\beta + (X'\Omega ^{-1}X)^{-1}X'\Omega^{-1}\epsilon|X)} \\ &= \mathbf{Var((X'\Omega ^{-1}X)^{-1}X'\Omega^{-1}\epsilon|X)} \\ &= \mathbf{(X'\Omega ^{-1}X)^{-1}X'\Omega^{-1} Var(\epsilon|X) \Omega^{-1}X(X'\Omega ^{-1}X)^{-1}} && \text{because A4 holds}\\ &= \mathbf{(X'\Omega ^{-1}X)^{-1}X'\Omega^{-1} \Omega \Omega^{-1} \Omega^{-1}X(X'\Omega ^{-1}X)^{-1}} \\ &= \mathbf{(X'\Omega ^{-1}X)^{-1}} \end{aligned} \]

Let $A = \mathbf{(X'X)^{-1}X'-(X'\Omega ^{-1} X)X' \Omega^{-1}}$ then \[ Var(\hat{\beta}_{OLS}|X)- Var(\hat{\beta}_{IGLS}|X) = A\Omega A' \] And $\Omega$ is Positive Semi Definite, then $A\Omega A'$ also PSD, then IGLS is more efficient

The name Infeasible comes from the fact that it is impossible to compute this estimator.

\[ \mathbf{w} = \left( \begin{array}{ccccc} w_{11} & 0 & 0 & ... & 0 \\ w_{21} & w_{22} & 0 & ... & 0 \\ w_{31} & w_{32} & w_{33} & ... & ... \\ w_{n1} & w_{n2} & w_{n3} & ... & w_{nn} \\ \end{array} \right) \]

With $n(n+1)/2$ number of elements and n observations $\rightarrow$ infeasible to estimate. (number of equation > data)

Hence, we need to make assumption on $\Omega$ to make it feasible to estimate $\mathbf{w}$:

Heteroskedasticity : multiplicative exponential model
AR(1)
Cluster

5.2.1 Heteroskedasticity

\[\begin{equation} \begin{aligned} Var(\epsilon_i |x_i) & = E(\epsilon^2|x_i) \neq \sigma^2 \\ & = h(x_i) = \sigma_i^2 \text{(variance of the error term is a function of x)} \end{aligned} \tag{5.7} \end{equation}\]

For our model,

\[ \begin{aligned} y_i &= x_i\beta + \epsilon_i \\ (1/\sigma_i)y_i &= (1/\sigma_i)x_i\beta + (1/\sigma_i)\epsilon_i \end{aligned} \]

then, from (5.7)

\[ \begin{aligned} Var((1/\sigma_i)\epsilon_i|X) &= (1/\sigma_i^2) Var(\epsilon_i|X) \\ &= (1/\sigma_i^2)\sigma_i^2 \\ &= 1 \end{aligned} \]

then the weight matrix $\mathbf{w}$ in the matrix equation

\[ \mathbf{wy=wX\beta + w\epsilon} \]

\[ \mathbf{w}= \left( \begin{array}{ccccc} 1/\sigma_1 & 0 & 0 & ... & 0 \\ 0 & 1/\sigma_2 & 0 & ... & 0 \\ 0 & 0 & 1/\sigma_3 & ... & . \\ . & . & . & . & 0 \\ 0 & 0 & . & . & 1/\sigma_n \end{array} \right) \]

Infeasible Weighted Least Squares

Assume we know $\sigma_i^2$ (Infeasible)
The IWLS estimator is obtained as the least squared estimated for the following weighted equation

\[ (1/\sigma_i)y_i = (1/\sigma_i)\mathbf{x}_i\beta + (1/\sigma_i)\epsilon_i \]

Usual standard errors for the weighted equation are valid if $Var(\epsilon | \mathbf{X}) = \sigma_i^2$
If $Var(\epsilon | \mathbf{X}) \neq \sigma_i^2$ then heteroskedastic robust standard errors are valid.

Problem: We do not know $\sigma_i^2=Var(\epsilon_i|\mathbf{x_i})=E(\epsilon_i^2|\mathbf{x}_i)$

One observation $\epsilon_i$ cannot estimate a sample variance estimate $\sigma_i^2$
- Model $\epsilon_i^2$ as reasonable (strictly positive) function of $x_i$ and independent error $v_i$ (strictly positive)

\[ \epsilon_i^2=v_i exp(\mathbf{x_i\gamma}) \]

Then we can apply a log transformation to recover a linear in parameters model,

\[ ln(\epsilon_i^2) = \mathbf{x_i\gamma} + ln(v_i) \]

where $ln(v_i)$ is independent $\mathbf{x}_i$

We do not observe $\epsilon_i$ * OLS residual ($e_i$) as an approximate

5.2.2 Serial Correlation

\[ Cov(\epsilon_i, \epsilon_j | \mathbf{X}) \neq 0 \]

Under covariance stationary,

\[ Cov(\epsilon_i,\epsilon_j|\mathbf{X}) = Cov(\epsilon_i, \epsilon_{i+h}|\mathbf{x_i,x_{i+h}})=\gamma_h \]

And the variance covariance matrix is

\[ Var(\epsilon|\mathbf{X}) = \Omega = \left( \begin{array}{ccccc} \sigma^2 & \gamma_1 & \gamma_2 & ... & \gamma_{n-1} \\ \gamma_1 & \sigma^2 & \gamma_1 & ... & \gamma_{n-2} \\ \gamma_2 & \gamma_1 & \sigma^2 & ... & ... \\ . & . & . & . & \gamma_1 \\ \gamma_{n-1} & \gamma_{n-2} & . & \gamma_1 & \sigma^2 \end{array} \right) \]

There n parameters to estimate - need some sort fo structure to reduce number of parameters to estimate.

Time Series
- Effect of inflation and deficit on Treasury Bill interest rates
Cross-sectional
- Clustering

5.2.2.1 AR(1)

\[ \begin{aligned} y_t &= \beta_0 + x_t\beta_1 + \epsilon_t \\ \epsilon_t &= \rho \epsilon_{t-1} + u_t \end{aligned} \]

and the variance covariance matrix is

\[ Var(\epsilon | \mathbf{X})= \frac{\sigma^2_u}{1-\rho} \left( \begin{array}{ccccc} 1 & \rho & \rho^2 & ... & \rho^{n-1} \\ \rho & 1 & \rho & ... & \rho^{n-2} \\ \rho^2 & \rho & 1 & . & . \\ . & . & . & . & \rho \\ \rho^{n-1} & \rho^{n-2} & . & \rho & 1 \\ \end{array} \right) \]

Hence, there is only 1 parameter to estimate: $\rho$

Under A1, A2, A3a, A5a, OLS is consistent and asymptotically normal
Use Newey West Standard Errors for valid inference.
Apply Infeasible Cochrane Orcutt (as if we knew $\rho$)
Because

\[ u_t = \epsilon_t - \rho \epsilon_{t-1} \]

satisfies A3, A4, A5 we’d like to to transform the above equation to one that has $u_t$ as the error.

\[ \begin{aligned} y_t - \rho y_{t-1} &= (\beta_0 + x\beta_1 + \epsilon_t) - \rho (\beta_0 + x_{t-1}\beta_1 + \epsilon_{t-1}) \\ & = (1-\rho)\beta_0 + (x_t - \rho x_{t-1})\beta_1 + u_t \end{aligned} \]

5.2.2.1.1 Infeasible Cochrane Orcutt

Assume that we know $\rho$ (Infeasible)
The ICO estimator is obtained as the least squared estimated for the following weighted first difference equation

\[ y_t -\rho y_{t-1} = (1-\rho)\beta_0 + (x_t - \rho x_{t-1})\beta_1 + u_t \]

Usual standard errors for the weighted first difference equation are valid if the errors truly follow an AR(1) process
If the serial correlation is generated from a more complex dynamic process then Newey-West HAC standard errors are valid

Problem We do not know $\rho$

$\rho$ is the correlation between $\epsilon_t$ and $\epsilon_{t-1}$: estimate using OLS residuals ($e_i$) as proxy

\[ \hat{\rho} = \frac{\sum_{t=1}^{T}e_te_{t-1}}{\sum_{t=1}^{T}e_t^2} \]

which can be obtained from the OLS regression of

\[ e_t = \rho e_{t-1} + u_t \]

where we suppress the intercept.

We are losing an observation
By taking the first difference we are dropping the first observation

\[ y_1 = \beta_0 + x_1 \beta_1 + \epsilon_1 \]

Feasiable Prais Winsten Transformation applies the Infeasible Cochrane Orcutt but includes a weighted version of the first observation

\[ (\sqrt{1-\rho^2})y_1 = \beta_0 + (\sqrt{1-\rho^2})x_1 \beta_1 + (\sqrt{1-\rho^2}) \epsilon_1 \]

5.2.2.2 Cluster

\[ y_{gi} = \mathbf{x}_{gi}\beta + \epsilon_{gi} \]

\[ Cov(\epsilon_{gi}, \epsilon_{hj}) \begin{cases} = 0 & \text{for $g \neq h$ and any pair (i,j)} \\ \neq 0 & \text{for any (i,j) pair}\\ \end{cases} \]

Intra-group Correlation
Each individual in a single group may be correlated but independent across groups.

A4 is violated. usual standard errors for OLS are valid.
Use cluster robust standard errors for OLS.

Suppose there are 3 groups with different n

\[ Var(\epsilon| \mathbf{X})= \Omega = \left( \begin{array}{cccccc} \sigma^2 & \delta_{12}^1 & \delta_{13}^1 & 0 & 0 & 0 \\ \delta_{12}^1 & \sigma^2 & \delta_{23}^1 & 0 & 0 & 0 \\ \delta_{13}^1 & \delta_{23}^1 & \sigma^2 & 0 & 0 & 0 \\ 0 & 0 & 0 & \sigma^2 & \delta_{12}^2 & 0 \\ 0 & 0 & 0 & \delta_{12}^2 & \sigma^2 & 0 \\ 0 & 0 & 0 & 0 & 0 & \sigma^2 \end{array} \right) \]

where $Cov(\epsilon_{gi}, \epsilon_{gj}) = \delta_{ij}^g$ and $Cov(\epsilon_{gi}, \epsilon_{hj}) = 0$ for any i and j

Infeasible Generalized Least Squares (Cluster)

Assume that $\sigma^2$ and $\delta_{ij}^g$ are known, plug into $\Omega$ and solve for the inverse $\Omega^{-1}$ (infeasible)
The Infeasible Generalized Least Squares Estimator is

\[ \hat{\beta}_{IGLS} = \mathbf{(X'\Omega^{-1}X)^{-1}X'\Omega^{-1}y} \]

Problem * We do not know $\sigma^2$ and $\delta_{ij}^g$ + Can make assumptions about data generating process that is causing the clustering behavior. - Will give structure to $Cov(\epsilon_{gi},\epsilon_{gj})= \delta_{ij}^g$ which makes it feasible to estimate - if the assumptions are wrong then we should use cluster robust standard errors.

Solution Assume group level random effects specification in the error

\[ \begin{aligned} y_{gi} &= \mathbf{g}_i \beta + c_g + u_{gi} \\ Var(c_g|\mathbf{x}_i) &= \sigma^2_c \\ Var(u_{gi}|\mathbf{x}_i) &= \sigma^2_u \end{aligned} \]

where $c_g$ and $u_{gi}$ are independent of each other, and mean independent of $\mathbf{x}_i$

$c_g$ captures the common group shocks (independent across groups)
$u_{gi}$ captures the individual shocks (independent across individuals and groups)

Then the error variance is

\[ Var(\epsilon| \mathbf{X})= \Omega = \left( \begin{array}{cccccc} \sigma^2_c + \sigma^2_u & \sigma^2_c & \sigma^2_c & 0 & 0 & 0 \\ \sigma^2_c & \sigma^2 + \sigma^2_u & \sigma^2_c & 0 & 0 & 0 \\ \sigma^2_c & \sigma^2_c & \sigma^2+ \sigma^2_u & 0 & 0 & 0 \\ 0 & 0 & 0 & \sigma^2+ \sigma^2_u & \sigma^2_c & 0 \\ 0 & 0 & 0 & \sigma^2_c & \sigma^2+ \sigma^2_u & 0 \\ 0 & 0 & 0 & 0 & 0 & \sigma^2+ \sigma^2_u \end{array} \right) \]

Use Feasible group level Random Effects