## 5.2 Feasible Generalized Least Squares

Motivation for a more efficient estimator

• Gauss-Markov Theorem holds under A1-A4

• A4: $$Var(\epsilon| \mathbf{X} )=\sigma^2I_n$$

• Heteroskedasticity: $$Var(\epsilon_i|\mathbf{X}) \neq \sigma^2I_n$$
• Serial Correlation: $$Cov(\epsilon_i,\epsilon_j|\mathbf{X}) \neq 0$$
• Without A4, how can we know which unbiased estimator is the most efficient?

Original (unweighted) model:

$\mathbf{y=X\beta+ \epsilon}$

Suppose A1-A3 hold, but A4 does not hold,

$\mathbf{Var(\epsilon|X)=\Omega \neq \sigma^2 I_n}$

We will try to use OLS to estimate the transformed (weighted) model

$\mathbf{wy=wX\beta + w\epsilon}$

We need to choose $$\mathbf{w}$$ so that

$\mathbf{w'w = \Omega^{-1}}$

then $$\mathbf{w}$$ (full-rank matrix) is the Cholesky decomposition of $$\mathbf{\Omega^{-1}}$$ (full-rank matrix)

In other words, $$\mathbf{w}$$ is the squared root of $$\Omega$$ (squared root version in matrix)

\begin{aligned} \Omega &= var(\epsilon | X) \\ \Omega^{-1} &= var(\epsilon | X)^{-1} \end{aligned}

Then, the transformed equation (IGLS) will have the following properties.

\begin{aligned} \mathbf{\hat{\beta}_{IGLS}} &= \mathbf{(X'w'wX)^{-1}X'w'wy} \\ & = \mathbf{(X'\Omega^{-1}X)^{-1}X'\Omega^{-1}y} \\ & = \mathbf{\beta + X'\Omega^{-1}X'\Omega^{-1}\epsilon} \end{aligned}

Since A1-A3 hold for the unweighted model

\begin{aligned} \mathbf{E(\hat{\beta}_{IGLS}|X)} & = E(\mathbf{\beta + (X'\Omega^{-1}X'\Omega^{-1}\epsilon)}|X)\\ & = \mathbf{\beta + E(X'\Omega^{-1}X'\Omega^{-1}\epsilon)|X)} \\ & = \mathbf{\beta + X'\Omega^{-1}X'\Omega^{-1}E(\epsilon|X)} && \text{since A3}: E(\epsilon|X)=0 \\ & = \mathbf{\beta} \end{aligned}

$$\rightarrow$$ IGLS estimator is unbiased

\begin{aligned} \mathbf{Var(w\epsilon|X)} &= \mathbf{wVar(\epsilon|X)w'} \\ & = \mathbf{w\Omega w'} \\ & = \mathbf{w(w'w)^{-1}w'} && \text{since w is a full-rank matrix}\\ & = \mathbf{ww^{-1}(w')^{-1}w'} \\ & = \mathbf{I_n} \end{aligned}

$$\rightarrow$$ A4 holds for the transformed (weighted) equation

Then, the variance for the estimator is

\begin{aligned} Var(\hat{\beta}_{IGLS}|\mathbf{X}) & = \mathbf{Var(\beta + (X'\Omega ^{-1}X)^{-1}X'\Omega^{-1}\epsilon|X)} \\ &= \mathbf{Var((X'\Omega ^{-1}X)^{-1}X'\Omega^{-1}\epsilon|X)} \\ &= \mathbf{(X'\Omega ^{-1}X)^{-1}X'\Omega^{-1} Var(\epsilon|X) \Omega^{-1}X(X'\Omega ^{-1}X)^{-1}} && \text{because A4 holds}\\ &= \mathbf{(X'\Omega ^{-1}X)^{-1}X'\Omega^{-1} \Omega \Omega^{-1} \Omega^{-1}X(X'\Omega ^{-1}X)^{-1}} \\ &= \mathbf{(X'\Omega ^{-1}X)^{-1}} \end{aligned}

Let $$A = \mathbf{(X'X)^{-1}X'-(X'\Omega ^{-1} X)X' \Omega^{-1}}$$ then $Var(\hat{\beta}_{OLS}|X)- Var(\hat{\beta}_{IGLS}|X) = A\Omega A'$ And $$\Omega$$ is Positive Semi Definite, then $$A\Omega A'$$ also PSD, then IGLS is more efficient

The name Infeasible comes from the fact that it is impossible to compute this estimator.

$\mathbf{w} = \left( \begin{array}{ccccc} w_{11} & 0 & 0 & ... & 0 \\ w_{21} & w_{22} & 0 & ... & 0 \\ w_{31} & w_{32} & w_{33} & ... & ... \\ w_{n1} & w_{n2} & w_{n3} & ... & w_{nn} \\ \end{array} \right)$

With $$n(n+1)/2$$ number of elements and n observations $$\rightarrow$$ infeasible to estimate. (number of equation > data)

Hence, we need to make assumption on $$\Omega$$ to make it feasible to estimate $$\mathbf{w}$$:

1. Heteroskedasticity : multiplicative exponential model
2. AR(1)
3. Cluster

### 5.2.1 Heteroskedasticity

\begin{aligned} Var(\epsilon_i |x_i) & = E(\epsilon^2|x_i) \neq \sigma^2 \\ & = h(x_i) = \sigma_i^2 \text{(variance of the error term is a function of x)} \end{aligned} \tag{5.7}

For our model,

\begin{aligned} y_i &= x_i\beta + \epsilon_i \\ (1/\sigma_i)y_i &= (1/\sigma_i)x_i\beta + (1/\sigma_i)\epsilon_i \end{aligned}

then, from (5.7)

\begin{aligned} Var((1/\sigma_i)\epsilon_i|X) &= (1/\sigma_i^2) Var(\epsilon_i|X) \\ &= (1/\sigma_i^2)\sigma_i^2 \\ &= 1 \end{aligned}

then the weight matrix $$\mathbf{w}$$ in the matrix equation

$\mathbf{wy=wX\beta + w\epsilon}$

$\mathbf{w}= \left( \begin{array}{ccccc} 1/\sigma_1 & 0 & 0 & ... & 0 \\ 0 & 1/\sigma_2 & 0 & ... & 0 \\ 0 & 0 & 1/\sigma_3 & ... & . \\ . & . & . & . & 0 \\ 0 & 0 & . & . & 1/\sigma_n \end{array} \right)$

Infeasible Weighted Least Squares

1. Assume we know $$\sigma_i^2$$ (Infeasible)
2. The IWLS estimator is obtained as the least squared estimated for the following weighted equation

$(1/\sigma_i)y_i = (1/\sigma_i)\mathbf{x}_i\beta + (1/\sigma_i)\epsilon_i$

• Usual standard errors for the weighted equation are valid if $$Var(\epsilon | \mathbf{X}) = \sigma_i^2$$
• If $$Var(\epsilon | \mathbf{X}) \neq \sigma_i^2$$ then heteroskedastic robust standard errors are valid.

Problem: We do not know $$\sigma_i^2=Var(\epsilon_i|\mathbf{x_i})=E(\epsilon_i^2|\mathbf{x}_i)$$

• One observation $$\epsilon_i$$ cannot estimate a sample variance estimate $$\sigma_i^2$$

• Model $$\epsilon_i^2$$ as reasonable (strictly positive) function of $$x_i$$ and independent error $$v_i$$ (strictly positive)

$\epsilon_i^2=v_i exp(\mathbf{x_i\gamma})$

Then we can apply a log transformation to recover a linear in parameters model,

$ln(\epsilon_i^2) = \mathbf{x_i\gamma} + ln(v_i)$

where $$ln(v_i)$$ is independent $$\mathbf{x}_i$$

We do not observe $$\epsilon_i$$ * OLS residual ($$e_i$$) as an approximate

### 5.2.2 Serial Correlation

$Cov(\epsilon_i, \epsilon_j | \mathbf{X}) \neq 0$

Under covariance stationary,

$Cov(\epsilon_i,\epsilon_j|\mathbf{X}) = Cov(\epsilon_i, \epsilon_{i+h}|\mathbf{x_i,x_{i+h}})=\gamma_h$

And the variance covariance matrix is

$Var(\epsilon|\mathbf{X}) = \Omega = \left( \begin{array}{ccccc} \sigma^2 & \gamma_1 & \gamma_2 & ... & \gamma_{n-1} \\ \gamma_1 & \sigma^2 & \gamma_1 & ... & \gamma_{n-2} \\ \gamma_2 & \gamma_1 & \sigma^2 & ... & ... \\ . & . & . & . & \gamma_1 \\ \gamma_{n-1} & \gamma_{n-2} & . & \gamma_1 & \sigma^2 \end{array} \right)$

There n parameters to estimate - need some sort fo structure to reduce number of parameters to estimate.

#### 5.2.2.1 AR(1)

\begin{aligned} y_t &= \beta_0 + x_t\beta_1 + \epsilon_t \\ \epsilon_t &= \rho \epsilon_{t-1} + u_t \end{aligned}

and the variance covariance matrix is

$Var(\epsilon | \mathbf{X})= \frac{\sigma^2_u}{1-\rho} \left( \begin{array}{ccccc} 1 & \rho & \rho^2 & ... & \rho^{n-1} \\ \rho & 1 & \rho & ... & \rho^{n-2} \\ \rho^2 & \rho & 1 & . & . \\ . & . & . & . & \rho \\ \rho^{n-1} & \rho^{n-2} & . & \rho & 1 \\ \end{array} \right)$

Hence, there is only 1 parameter to estimate: $$\rho$$

$u_t = \epsilon_t - \rho \epsilon_{t-1}$

satisfies A3, A4, A5 we’d like to to transform the above equation to one that has $$u_t$$ as the error.

\begin{aligned} y_t - \rho y_{t-1} &= (\beta_0 + x\beta_1 + \epsilon_t) - \rho (\beta_0 + x_{t-1}\beta_1 + \epsilon_{t-1}) \\ & = (1-\rho)\beta_0 + (x_t - \rho x_{t-1})\beta_1 + u_t \end{aligned}

##### 5.2.2.1.1 Infeasible Cochrane Orcutt
1. Assume that we know $$\rho$$ (Infeasible)
2. The ICO estimator is obtained as the least squared estimated for the following weighted first difference equation

$y_t -\rho y_{t-1} = (1-\rho)\beta_0 + (x_t - \rho x_{t-1})\beta_1 + u_t$

• Usual standard errors for the weighted first difference equation are valid if the errors truly follow an AR(1) process
• If the serial correlation is generated from a more complex dynamic process then Newey-West HAC standard errors are valid

Problem We do not know $$\rho$$

• $$\rho$$ is the correlation between $$\epsilon_t$$ and $$\epsilon_{t-1}$$: estimate using OLS residuals ($$e_i$$) as proxy

$\hat{\rho} = \frac{\sum_{t=1}^{T}e_te_{t-1}}{\sum_{t=1}^{T}e_t^2}$

which can be obtained from the OLS regression of

$e_t = \rho e_{t-1} + u_t$

where we suppress the intercept.

• We are losing an observation

• By taking the first difference we are dropping the first observation

$y_1 = \beta_0 + x_1 \beta_1 + \epsilon_1$

$(\sqrt{1-\rho^2})y_1 = \beta_0 + (\sqrt{1-\rho^2})x_1 \beta_1 + (\sqrt{1-\rho^2}) \epsilon_1$

#### 5.2.2.2 Cluster

$y_{gi} = \mathbf{x}_{gi}\beta + \epsilon_{gi}$

$Cov(\epsilon_{gi}, \epsilon_{hj}) \begin{cases} = 0 & \text{for g \neq h and any pair (i,j)} \\ \neq 0 & \text{for any (i,j) pair}\\ \end{cases}$

Intra-group Correlation
Each individual in a single group may be correlated but independent across groups.

• A4 is violated. usual standard errors for OLS are valid.
• Use cluster robust standard errors for OLS.

Suppose there are 3 groups with different n

$Var(\epsilon| \mathbf{X})= \Omega = \left( \begin{array}{cccccc} \sigma^2 & \delta_{12}^1 & \delta_{13}^1 & 0 & 0 & 0 \\ \delta_{12}^1 & \sigma^2 & \delta_{23}^1 & 0 & 0 & 0 \\ \delta_{13}^1 & \delta_{23}^1 & \sigma^2 & 0 & 0 & 0 \\ 0 & 0 & 0 & \sigma^2 & \delta_{12}^2 & 0 \\ 0 & 0 & 0 & \delta_{12}^2 & \sigma^2 & 0 \\ 0 & 0 & 0 & 0 & 0 & \sigma^2 \end{array} \right)$

where $$Cov(\epsilon_{gi}, \epsilon_{gj}) = \delta_{ij}^g$$ and $$Cov(\epsilon_{gi}, \epsilon_{hj}) = 0$$ for any i and j

Infeasible Generalized Least Squares (Cluster)

1. Assume that $$\sigma^2$$ and $$\delta_{ij}^g$$ are known, plug into $$\Omega$$ and solve for the inverse $$\Omega^{-1}$$ (infeasible)
2. The Infeasible Generalized Least Squares Estimator is

$\hat{\beta}_{IGLS} = \mathbf{(X'\Omega^{-1}X)^{-1}X'\Omega^{-1}y}$

Problem * We do not know $$\sigma^2$$ and $$\delta_{ij}^g$$ + Can make assumptions about data generating process that is causing the clustering behavior. - Will give structure to $$Cov(\epsilon_{gi},\epsilon_{gj})= \delta_{ij}^g$$ which makes it feasible to estimate - if the assumptions are wrong then we should use cluster robust standard errors.

Solution Assume group level random effects specification in the error

\begin{aligned} y_{gi} &= \mathbf{g}_i \beta + c_g + u_{gi} \\ Var(c_g|\mathbf{x}_i) &= \sigma^2_c \\ Var(u_{gi}|\mathbf{x}_i) &= \sigma^2_u \end{aligned}

where $$c_g$$ and $$u_{gi}$$ are independent of each other, and mean independent of $$\mathbf{x}_i$$

• $$c_g$$ captures the common group shocks (independent across groups)
• $$u_{gi}$$ captures the individual shocks (independent across individuals and groups)

Then the error variance is

$Var(\epsilon| \mathbf{X})= \Omega = \left( \begin{array}{cccccc} \sigma^2_c + \sigma^2_u & \sigma^2_c & \sigma^2_c & 0 & 0 & 0 \\ \sigma^2_c & \sigma^2 + \sigma^2_u & \sigma^2_c & 0 & 0 & 0 \\ \sigma^2_c & \sigma^2_c & \sigma^2+ \sigma^2_u & 0 & 0 & 0 \\ 0 & 0 & 0 & \sigma^2+ \sigma^2_u & \sigma^2_c & 0 \\ 0 & 0 & 0 & \sigma^2_c & \sigma^2+ \sigma^2_u & 0 \\ 0 & 0 & 0 & 0 & 0 & \sigma^2+ \sigma^2_u \end{array} \right)$