5.2 Feasible Generalized Least Squares
Motivation for a more efficient estimator
Gauss-Markov Theorem holds under A1-A4
A4: \(Var(\epsilon| \mathbf{X} )=\sigma^2I_n\)
- Heteroskedasticity: \(Var(\epsilon_i|\mathbf{X}) \neq \sigma^2I_n\)
- Serial Correlation: \(Cov(\epsilon_i,\epsilon_j|\mathbf{X}) \neq 0\)
Without A4, how can we know which unbiased estimator is the most efficient?
Original (unweighted) model:
\[ \mathbf{y=X\beta+ \epsilon} \]
Suppose A1-A3 hold, but A4 does not hold,
\[ \mathbf{Var(\epsilon|X)=\Omega \neq \sigma^2 I_n} \]
We will try to use OLS to estimate the transformed (weighted) model
\[ \mathbf{wy=wX\beta + w\epsilon} \]
We need to choose \(\mathbf{w}\) so that
\[ \mathbf{w'w = \Omega^{-1}} \]
then \(\mathbf{w}\) (full-rank matrix) is the Cholesky decomposition of \(\mathbf{\Omega^{-1}}\) (full-rank matrix)
In other words, \(\mathbf{w}\) is the squared root of \(\Omega\) (squared root version in matrix)
\[ \begin{aligned} \Omega &= var(\epsilon | X) \\ \Omega^{-1} &= var(\epsilon | X)^{-1} \end{aligned} \]
Then, the transformed equation (IGLS) will have the following properties.
\[ \begin{aligned} \mathbf{\hat{\beta}_{IGLS}} &= \mathbf{(X'w'wX)^{-1}X'w'wy} \\ & = \mathbf{(X'\Omega^{-1}X)^{-1}X'\Omega^{-1}y} \\ & = \mathbf{\beta + X'\Omega^{-1}X'\Omega^{-1}\epsilon} \end{aligned} \]
Since A1-A3 hold for the unweighted model
\[ \begin{aligned} \mathbf{E(\hat{\beta}_{IGLS}|X)} & = E(\mathbf{\beta + (X'\Omega^{-1}X'\Omega^{-1}\epsilon)}|X)\\ & = \mathbf{\beta + E(X'\Omega^{-1}X'\Omega^{-1}\epsilon)|X)} \\ & = \mathbf{\beta + X'\Omega^{-1}X'\Omega^{-1}E(\epsilon|X)} && \text{since A3}: E(\epsilon|X)=0 \\ & = \mathbf{\beta} \end{aligned} \]
\(\rightarrow\) IGLS estimator is unbiased
\[ \begin{aligned} \mathbf{Var(w\epsilon|X)} &= \mathbf{wVar(\epsilon|X)w'} \\ & = \mathbf{w\Omega w'} \\ & = \mathbf{w(w'w)^{-1}w'} && \text{since w is a full-rank matrix}\\ & = \mathbf{ww^{-1}(w')^{-1}w'} \\ & = \mathbf{I_n} \end{aligned} \]
\(\rightarrow\) A4 holds for the transformed (weighted) equation
Then, the variance for the estimator is
\[ \begin{aligned} Var(\hat{\beta}_{IGLS}|\mathbf{X}) & = \mathbf{Var(\beta + (X'\Omega ^{-1}X)^{-1}X'\Omega^{-1}\epsilon|X)} \\ &= \mathbf{Var((X'\Omega ^{-1}X)^{-1}X'\Omega^{-1}\epsilon|X)} \\ &= \mathbf{(X'\Omega ^{-1}X)^{-1}X'\Omega^{-1} Var(\epsilon|X) \Omega^{-1}X(X'\Omega ^{-1}X)^{-1}} && \text{because A4 holds}\\ &= \mathbf{(X'\Omega ^{-1}X)^{-1}X'\Omega^{-1} \Omega \Omega^{-1} \Omega^{-1}X(X'\Omega ^{-1}X)^{-1}} \\ &= \mathbf{(X'\Omega ^{-1}X)^{-1}} \end{aligned} \]
Let \(A = \mathbf{(X'X)^{-1}X'-(X'\Omega ^{-1} X)X' \Omega^{-1}}\) then \[ Var(\hat{\beta}_{OLS}|X)- Var(\hat{\beta}_{IGLS}|X) = A\Omega A' \] And \(\Omega\) is Positive Semi Definite, then \(A\Omega A'\) also PSD, then IGLS is more efficient
The name Infeasible comes from the fact that it is impossible to compute this estimator.
\[ \mathbf{w} = \left( \begin{array}{ccccc} w_{11} & 0 & 0 & ... & 0 \\ w_{21} & w_{22} & 0 & ... & 0 \\ w_{31} & w_{32} & w_{33} & ... & ... \\ w_{n1} & w_{n2} & w_{n3} & ... & w_{nn} \\ \end{array} \right) \]
With \(n(n+1)/2\) number of elements and n observations \(\rightarrow\) infeasible to estimate. (number of equation > data)
Hence, we need to make assumption on \(\Omega\) to make it feasible to estimate \(\mathbf{w}\):
- Heteroskedasticity : multiplicative exponential model
- AR(1)
- Cluster
5.2.1 Heteroskedasticity
\[\begin{equation} \begin{aligned} Var(\epsilon_i |x_i) & = E(\epsilon^2|x_i) \neq \sigma^2 \\ & = h(x_i) = \sigma_i^2 \text{(variance of the error term is a function of x)} \end{aligned} \tag{5.7} \end{equation}\]For our model,
\[ \begin{aligned} y_i &= x_i\beta + \epsilon_i \\ (1/\sigma_i)y_i &= (1/\sigma_i)x_i\beta + (1/\sigma_i)\epsilon_i \end{aligned} \]
then, from (5.7)
\[ \begin{aligned} Var((1/\sigma_i)\epsilon_i|X) &= (1/\sigma_i^2) Var(\epsilon_i|X) \\ &= (1/\sigma_i^2)\sigma_i^2 \\ &= 1 \end{aligned} \]
then the weight matrix \(\mathbf{w}\) in the matrix equation
\[ \mathbf{wy=wX\beta + w\epsilon} \]
\[ \mathbf{w}= \left( \begin{array}{ccccc} 1/\sigma_1 & 0 & 0 & ... & 0 \\ 0 & 1/\sigma_2 & 0 & ... & 0 \\ 0 & 0 & 1/\sigma_3 & ... & . \\ . & . & . & . & 0 \\ 0 & 0 & . & . & 1/\sigma_n \end{array} \right) \]
Infeasible Weighted Least Squares
- Assume we know \(\sigma_i^2\) (Infeasible)
- The IWLS estimator is obtained as the least squared estimated for the following weighted equation
\[ (1/\sigma_i)y_i = (1/\sigma_i)\mathbf{x}_i\beta + (1/\sigma_i)\epsilon_i \]
- Usual standard errors for the weighted equation are valid if \(Var(\epsilon | \mathbf{X}) = \sigma_i^2\)
- If \(Var(\epsilon | \mathbf{X}) \neq \sigma_i^2\) then heteroskedastic robust standard errors are valid.
Problem: We do not know \(\sigma_i^2=Var(\epsilon_i|\mathbf{x_i})=E(\epsilon_i^2|\mathbf{x}_i)\)
One observation \(\epsilon_i\) cannot estimate a sample variance estimate \(\sigma_i^2\)
- Model \(\epsilon_i^2\) as reasonable (strictly positive) function of \(x_i\) and independent error \(v_i\) (strictly positive)
\[ \epsilon_i^2=v_i exp(\mathbf{x_i\gamma}) \]
Then we can apply a log transformation to recover a linear in parameters model,
\[ ln(\epsilon_i^2) = \mathbf{x_i\gamma} + ln(v_i) \]
where \(ln(v_i)\) is independent \(\mathbf{x}_i\)
We do not observe \(\epsilon_i\) * OLS residual (\(e_i\)) as an approximate
5.2.2 Serial Correlation
\[ Cov(\epsilon_i, \epsilon_j | \mathbf{X}) \neq 0 \]
Under covariance stationary,
\[ Cov(\epsilon_i,\epsilon_j|\mathbf{X}) = Cov(\epsilon_i, \epsilon_{i+h}|\mathbf{x_i,x_{i+h}})=\gamma_h \]
And the variance covariance matrix is
\[ Var(\epsilon|\mathbf{X}) = \Omega = \left( \begin{array}{ccccc} \sigma^2 & \gamma_1 & \gamma_2 & ... & \gamma_{n-1} \\ \gamma_1 & \sigma^2 & \gamma_1 & ... & \gamma_{n-2} \\ \gamma_2 & \gamma_1 & \sigma^2 & ... & ... \\ . & . & . & . & \gamma_1 \\ \gamma_{n-1} & \gamma_{n-2} & . & \gamma_1 & \sigma^2 \end{array} \right) \]
There n parameters to estimate - need some sort fo structure to reduce number of parameters to estimate.
-
- Effect of inflation and deficit on Treasury Bill interest rates
-
- Clustering
5.2.2.1 AR(1)
\[ \begin{aligned} y_t &= \beta_0 + x_t\beta_1 + \epsilon_t \\ \epsilon_t &= \rho \epsilon_{t-1} + u_t \end{aligned} \]
and the variance covariance matrix is
\[ Var(\epsilon | \mathbf{X})= \frac{\sigma^2_u}{1-\rho} \left( \begin{array}{ccccc} 1 & \rho & \rho^2 & ... & \rho^{n-1} \\ \rho & 1 & \rho & ... & \rho^{n-2} \\ \rho^2 & \rho & 1 & . & . \\ . & . & . & . & \rho \\ \rho^{n-1} & \rho^{n-2} & . & \rho & 1 \\ \end{array} \right) \]
Hence, there is only 1 parameter to estimate: \(\rho\)
- Under A1, A2, A3a, A5a, OLS is consistent and asymptotically normal
- Use Newey West Standard Errors for valid inference.
- Apply Infeasible Cochrane Orcutt (as if we knew \(\rho\))
- Because
\[ u_t = \epsilon_t - \rho \epsilon_{t-1} \]
satisfies A3, A4, A5 we’d like to to transform the above equation to one that has \(u_t\) as the error.
\[ \begin{aligned} y_t - \rho y_{t-1} &= (\beta_0 + x\beta_1 + \epsilon_t) - \rho (\beta_0 + x_{t-1}\beta_1 + \epsilon_{t-1}) \\ & = (1-\rho)\beta_0 + (x_t - \rho x_{t-1})\beta_1 + u_t \end{aligned} \]
5.2.2.1.1 Infeasible Cochrane Orcutt
- Assume that we know \(\rho\) (Infeasible)
- The ICO estimator is obtained as the least squared estimated for the following weighted first difference equation
\[ y_t -\rho y_{t-1} = (1-\rho)\beta_0 + (x_t - \rho x_{t-1})\beta_1 + u_t \]
- Usual standard errors for the weighted first difference equation are valid if the errors truly follow an AR(1) process
- If the serial correlation is generated from a more complex dynamic process then Newey-West HAC standard errors are valid
Problem We do not know \(\rho\)
- \(\rho\) is the correlation between \(\epsilon_t\) and \(\epsilon_{t-1}\): estimate using OLS residuals (\(e_i\)) as proxy
\[ \hat{\rho} = \frac{\sum_{t=1}^{T}e_te_{t-1}}{\sum_{t=1}^{T}e_t^2} \]
which can be obtained from the OLS regression of
\[ e_t = \rho e_{t-1} + u_t \]
where we suppress the intercept.
We are losing an observation
By taking the first difference we are dropping the first observation
\[ y_1 = \beta_0 + x_1 \beta_1 + \epsilon_1 \]
- Feasiable Prais Winsten Transformation applies the Infeasible Cochrane Orcutt but includes a weighted version of the first observation
\[ (\sqrt{1-\rho^2})y_1 = \beta_0 + (\sqrt{1-\rho^2})x_1 \beta_1 + (\sqrt{1-\rho^2}) \epsilon_1 \]
5.2.2.2 Cluster
\[ y_{gi} = \mathbf{x}_{gi}\beta + \epsilon_{gi} \]
\[ Cov(\epsilon_{gi}, \epsilon_{hj}) \begin{cases} = 0 & \text{for $g \neq h$ and any pair (i,j)} \\ \neq 0 & \text{for any (i,j) pair}\\ \end{cases} \]
Intra-group Correlation
Each individual in a single group may be correlated but independent across groups.
- A4 is violated. usual standard errors for OLS are valid.
- Use cluster robust standard errors for OLS.
Suppose there are 3 groups with different n
\[ Var(\epsilon| \mathbf{X})= \Omega = \left( \begin{array}{cccccc} \sigma^2 & \delta_{12}^1 & \delta_{13}^1 & 0 & 0 & 0 \\ \delta_{12}^1 & \sigma^2 & \delta_{23}^1 & 0 & 0 & 0 \\ \delta_{13}^1 & \delta_{23}^1 & \sigma^2 & 0 & 0 & 0 \\ 0 & 0 & 0 & \sigma^2 & \delta_{12}^2 & 0 \\ 0 & 0 & 0 & \delta_{12}^2 & \sigma^2 & 0 \\ 0 & 0 & 0 & 0 & 0 & \sigma^2 \end{array} \right) \]
where \(Cov(\epsilon_{gi}, \epsilon_{gj}) = \delta_{ij}^g\) and \(Cov(\epsilon_{gi}, \epsilon_{hj}) = 0\) for any i and j
Infeasible Generalized Least Squares (Cluster)
- Assume that \(\sigma^2\) and \(\delta_{ij}^g\) are known, plug into \(\Omega\) and solve for the inverse \(\Omega^{-1}\) (infeasible)
- The Infeasible Generalized Least Squares Estimator is
\[ \hat{\beta}_{IGLS} = \mathbf{(X'\Omega^{-1}X)^{-1}X'\Omega^{-1}y} \]
Problem * We do not know \(\sigma^2\) and \(\delta_{ij}^g\) + Can make assumptions about data generating process that is causing the clustering behavior. - Will give structure to \(Cov(\epsilon_{gi},\epsilon_{gj})= \delta_{ij}^g\) which makes it feasible to estimate - if the assumptions are wrong then we should use cluster robust standard errors.
Solution Assume group level random effects specification in the error
\[ \begin{aligned} y_{gi} &= \mathbf{g}_i \beta + c_g + u_{gi} \\ Var(c_g|\mathbf{x}_i) &= \sigma^2_c \\ Var(u_{gi}|\mathbf{x}_i) &= \sigma^2_u \end{aligned} \]
where \(c_g\) and \(u_{gi}\) are independent of each other, and mean independent of \(\mathbf{x}_i\)
- \(c_g\) captures the common group shocks (independent across groups)
- \(u_{gi}\) captures the individual shocks (independent across individuals and groups)
Then the error variance is
\[ Var(\epsilon| \mathbf{X})= \Omega = \left( \begin{array}{cccccc} \sigma^2_c + \sigma^2_u & \sigma^2_c & \sigma^2_c & 0 & 0 & 0 \\ \sigma^2_c & \sigma^2 + \sigma^2_u & \sigma^2_c & 0 & 0 & 0 \\ \sigma^2_c & \sigma^2_c & \sigma^2+ \sigma^2_u & 0 & 0 & 0 \\ 0 & 0 & 0 & \sigma^2+ \sigma^2_u & \sigma^2_c & 0 \\ 0 & 0 & 0 & \sigma^2_c & \sigma^2+ \sigma^2_u & 0 \\ 0 & 0 & 0 & 0 & 0 & \sigma^2+ \sigma^2_u \end{array} \right) \]