5.2 Generalized Least Squares
5.2.1 Infeasible Generalized Least Squares
Motivation for a More Efficient Estimator
- The Gauss-Markov Theorem guarantees that OLS is the Best Linear Unbiased Estimator (BLUE) under assumptions A1-A4:
- A4: Var(ϵ|X)=σ2In (homoscedasticity and no autocorrelation).
- When A4 does not hold:
- Heteroskedasticity: Var(ϵi|X)≠σ2.
- Serial Correlation: Cov(ϵi,ϵj|X)≠0 for (i≠j).
Without A4, OLS is unbiased but no longer efficient. This motivates the need for an alternative approach to identify the most efficient estimator.
The unweighted (standard) regression model is given by:
y=Xβ+ϵ
Assuming A1-A3 hold (linearity, full rank, exogeneity), but A4 does not, the variance of the error term is no longer proportional to an identity
Var(ϵ|X)=Ω≠σ2In.
To address the violation of A4 (Ω≠σ2In), one can transform the model by premultiplying both sides by a full-rank matrix w to have a weighted (transformed) regression model:
wy=wXβ+wϵ,
where w is a full-rank matrix chosen such that:
w′w=Ω−1.
- w is the Cholesky Decomposition of Ω−1.
- The Cholesky decomposition ensures w satisfies w′w=Ω−1, where w is the “square root” of Ω−1 in matrix terms.
By transforming the original model, the variance of the transformed errors becomes:
Ω=Var(ϵ|X),Ω−1=Var(ϵ|X)−1.
The transformed equation allows us to compute a more efficient estimator.
Using the transformed model, the Infeasible Generalized Least Squares (IGLS) estimator is:
ˆβIGLS=(X′w′wX)−1X′w′wy=(X′Ω−1X)−1X′Ω−1y=β+(X′Ω−1X)−1X′Ω−1ϵ.
- Unbiasedness
Since assumptions A1-A3 hold for the unweighted model:
E(ˆβIGLS|X)=E(β+(X′Ω−1X′Ω−1ϵ)|X)=β+E(X′Ω−1X′Ω−1ϵ|X)=β+X′Ω−1X′Ω−1E(ϵ|X)since A3: E(ϵ|X)=0,=β.
Thus, the IGLS estimator is unbiased.
- Variance
The variance of the transformed errors is given by:
Var(wϵ|X)=wVar(ϵ|X)w′=wΩw′=w(w′w)−1w′since w is full-rank,=ww−1(w′)−1w′=In.
Hence, A4 holds for the transformed (weighted) equation, satisfying the Gauss-Markov conditions.
The variance of the IGLS estimator is:
Var(ˆβIGLS|X)=Var(β+(X′Ω−1X)−1X′Ω−1ϵ|X)=Var((X′Ω−1X)−1X′Ω−1ϵ|X)=(X′Ω−1X)−1X′Ω−1Var(ϵ|X)Ω−1X(X′Ω−1X)−1because A4 holds,=(X′Ω−1X)−1X′Ω−1ΩΩ−1Ω−1X(X′Ω−1X)−1,=(X′Ω−1X)−1.
- Efficiency
The difference in variances between OLS and IGLS is:
Var(ˆβOLS|X)−Var(ˆβIGLS|X)=AΩA′,
where:
A=(X′X)−1X′−(X′Ω−1X)−1X′Ω−1.
Since Ω is positive semi-definite, AΩA′ is also positive semi-definite. Thus, the IGLS estimator is more efficient than OLS under heteroskedasticity or autocorrelation.
In short, properties of ˆβIGLS:
- Unbiasedness: ˆβIGLS remains unbiased as long as A1-A3 hold.
- Efficiency: ˆβIGLS is more efficient than OLS under heteroskedasticity or serial correlation since it accounts for the structure of Ω.
Why Is IGLS “Infeasible”?
The name infeasible arises because it is generally impossible to compute the estimator directly due to the structure of w (or equivalently Ω−1). The matrix w is defined as:
w=(w1100⋯0w21w220⋯0w31w32w33⋯⋯wn1wn2wn3⋯wnn),
with n(n+1)/2 unique elements for n observations. This results in more parameters than data points, making direct estimation infeasible.
To make the estimation feasible, assumptions about the structure of Ω are required. Common approaches include:
Heteroskedasticity Errors: Assume a multiplicative exponential model for the variance, such as Var(ϵi|X)=σ2i.
Assume no correlation between errors, but allow heterogeneous variances: Ω=(σ210⋯00σ22⋯0⋮⋮⋱⋮00⋯σ2n).
Estimate σ2i using methods such as:
- Modeling σ2i as a function of predictors (e.g., σ2i=exp(xiγ)).
Serial Correlation: Assume serial correlation follows an autoregressive process AR(1) Model, e.g., ϵt=ρϵt−1+ut and Cov(ϵt,ϵt−h)=ρhσ2, where we have a variance-covariance matrix with off-diagonal elements decaying geometrically: Ω=σ21−ρ2(1ρρ2⋯ρn−1ρ1ρ⋯ρn−2ρ2ρ1⋯ρn−3⋮⋮⋮⋱⋮ρn−1ρn−2ρn−3⋯1).
Cluster Errors: Assume block-diagonal structure for Ω to account for grouped or panel data.
Each assumption simplifies the estimation of Ω and thus w, enabling Feasible Generalized Least Squares with fewer unknown parameters to estimate.
5.2.2 Feasible Generalized Least Squares
5.2.2.1 Heteroskedasticity Errors
Heteroskedasticity occurs when the variance of the error term is not constant across observations. Specifically:
Var(ϵi|xi)=E(ϵ2i|xi)≠σ2,
but instead depends on a function of xi:
Var(ϵi|xi)=h(xi)=σ2i
This violates the assumption of homoscedasticity (constant variance), impacting the efficiency of OLS estimates.
For the model:
yi=xiβ+ϵi,
we apply a transformation to standardize the variance:
yiσi=xiσiβ+ϵiσi.
By scaling each observation with 1/σi, the variance of the transformed error term becomes:
Var(ϵiσi|X)=1σ2iVar(ϵi|X)=1σ2iσ2i=1.
Thus, the heteroskedasticity is corrected in the transformed model.
In matrix notation, the transformed model is:
wy=wXβ+wϵ,
where w is the weight matrix used to standardize the variance. The weight matrix w is defined as:
w=(1/σ100⋯001/σ20⋯0001/σ3⋯0⋮⋮⋮⋱⋮000⋯1/σn).
In the presence of heteroskedasticity, the variance of the error term, Var(ϵi|xi), is not constant across observations. This leads to inefficient OLS estimates.
Infeasible Weighted Least Squares (IWLS) assumes that the variances σ2i=Var(ϵi|xi) are known. This allows us to adjust the regression equation to correct for heteroskedasticity.
The model is transformed as follows:
yi=xiβ+ϵi(original equation),
where ϵi has variance σ2i. To make the errors homoskedastic, we divide through by σi:
yiσi=xiσiβ+ϵiσi.
Now, the transformed error term ϵi/σi has a constant variance of 1:
Var(ϵiσi|xi)=1.
The IWLS estimator minimizes the weighted sum of squared residuals for the transformed model:
Minimize: n∑i=1(yi−xiβσi)2.
In matrix form, the IWLS estimator is:
ˆβIWLS=(X′WX)−1X′Wy,
where W is a diagonal matrix of weights:
W=(1/σ210⋯001/σ22⋯0⋮⋮⋱⋮00⋯1/σ2n).
Properties of IWLS
- Valid Standard Errors:
- If Var(ϵi|X)=σ2i, the usual standard errors from IWLS are valid.
- Robustness:
- If the variance assumption is incorrect (Var(ϵi|X)≠σ2i), heteroskedasticity-robust standard errors must be used instead.
The primary issue with IWLS is that σ2i=Var(ϵi|xi) is generally unknown. Specifically, we do not know:
σ2i=Var(ϵi|xi)=E(ϵ2i|xi).
The challenges are:
- Single Observation:
- For each observation i, there is only one ϵi, which is insufficient to estimate the variance σ2i directly.
- Dependence on Assumptions:
- To estimate σ2i, we must impose assumptions about its relationship to xi.
To make IWLS feasible, we model σ2i as a function of the predictors xi. A common approach is:
ϵ2i=viexp(xiγ),
where:
vi is an independent error term with strictly positive values, representing random noise.
exp(xiγ) is a deterministic function of the predictors xi.
Taking the natural logarithm of both sides linearizes the model:
ln(ϵ2i)=xiγ+ln(vi),
where ln(vi) is independent of xi. This transformation enables us to estimate γ using standard OLS techniques.
Estimation Procedure for Feasible GLS (FGLS)
Since we do not observe the true errors ϵi, we approximate them using the OLS residuals ei. Here’s the step-by-step process:
Compute OLS Residuals: First, fit the original model using OLS and calculate the residuals:
ei=yi−xiˆβOLS.
Approximate ϵ2i with e2i: Use the squared residuals as a proxy for the squared errors:
e2i≈ϵ2i.
Log-Linear Model: Fit the log-transformed model to estimate γ:
ln(e2i)=xiγ+ln(vi).
Estimate γ using OLS, where ln(vi) is treated as the error term.
Estimate Variances: Use the fitted values ˆγ to estimate σ2i for each observation:
ˆσ2i=exp(xiˆγ).
Perform Weighted Least Squares: Use the estimated variances ˆσ2i to construct the weight matrix ˆW:
ˆW=(1/ˆσ210⋯001/ˆσ22⋯0⋮⋮⋱⋮00⋯1/ˆσ2n).
Then, compute the Feasible GLS (FGLS) estimator:
ˆβFGLS=(X′ˆWX)−1X′ˆWy.
5.2.2.2 Serial Correlation
Serial correlation (also called autocorrelation) occurs when the error terms in a regression model are correlated across observations. Formally:
Cov(ϵi,ϵj|X)≠0for i≠j.
This violates the Gauss-Markov assumption that Cov(ϵi,ϵj|X)=0, leading to inefficiencies in OLS estimates.
5.2.2.2.1 Covariance Stationarity
If the errors are covariance stationary, the covariance between errors depends only on their relative time or positional difference (h), not their absolute position:
Cov(ϵi,ϵj|X)=Cov(ϵi,ϵi+h|xi,xi+h)=γh,
where γh represents the covariance at lag h.
Under covariance stationarity, the variance-covariance matrix of the error term ϵ takes the following form:
Var(ϵ|X)=Ω=(σ2γ1γ2⋯γn−1γ1σ2γ1⋯γn−2γ2γ1σ2⋯⋮⋮⋮⋮⋱γ1γn−1γn−2⋯γ1σ2).
Key Points:
- The diagonal elements represent the variance of the error term: σ2.
- The off-diagonal elements γh represent covariances at different lags h.
Why Serial Correlation Is a Problem?
The matrix Ω introduces n parameters to estimate (e.g., σ2,γ1,γ2,…,γn−1). Estimating such a large number of parameters becomes impractical, especially for large datasets. To address this, we impose additional structure to reduce the number of parameters.
5.2.2.2.2 AR(1) Model
In the AR(1) process, the errors follow a first-order autoregressive process:
yt=β0+xtβ1+ϵt,ϵt=ρϵt−1+ut,
where:
ρ is the first-order autocorrelation coefficient, capturing the relationship between consecutive errors.
ut is white noise, satisfying Var(ut)=σ2u and Cov(ut,ut−h)=0 for h≠0.
Under the AR(1) assumption, the variance-covariance matrix of the error term ϵ becomes:
Var(ϵ|X)=σ2u1−ρ2(1ρρ2⋯ρn−1ρ1ρ⋯ρn−2ρ2ρ1⋯ρn−3⋮⋮⋮⋱⋮ρn−1ρn−2⋯ρ1).
Key Features:
- The diagonal elements represent the variance: Var(ϵt|X)=σ2u/(1−ρ2).
- The off-diagonal elements decay exponentially with lag h: Cov(ϵt,ϵt−h|X)=ρh⋅Var(ϵt|X).
Under AR(1), only one parameter ρ needs to be estimated (in addition to σ2u), greatly simplifying the structure of Ω.
OLS Properties Under AR(1)
- Consistency: If assumptions A1, A2, A3a, and A5a hold, OLS remains consistent.
- Asymptotic Normality: OLS estimates are asymptotically normal.
- Inference with Serial Correlation:
- Standard OLS errors are invalid.
- Use Newey-West standard errors to obtain robust inference.
5.2.2.2.3 Infeasible Cochrane-Orcutt
The Infeasible Cochrane-Orcutt procedure addresses serial correlation in the error terms by assuming an AR(1) process for the errors:
ϵt=ρϵt−1+ut,
where ut is white noise and ρ is the autocorrelation coefficient.
By transforming the original regression equation:
yt=β0+xtβ1+ϵt,
we subtract ρ times the lagged equation:
ρyt−1=ρ(β0+xt−1β1+ϵt−1),
to obtain the weighted first-difference equation:
yt−ρyt−1=(1−ρ)β0+(xt−ρxt−1)β1+ut.
Key Points:
- Dependent Variable: yt−ρyt−1.
- Independent Variable: xt−ρxt−1.
- Error Term: ut, which satisfies the Gauss-Markov assumptions (A3, A4, A5).
The ICO estimator minimizes the sum of squared residuals for this transformed equation.
- Standard Errors:
- If the errors truly follow an AR(1) process, the standard errors for the transformed equation are valid.
- For more complex error structures, Newey-West HAC standard errors are required.
- Loss of Observations:
- The transformation involves first differences, which means the first observation (y1) cannot be used. This reduces the effective sample size by one.
The Problem: ρ Is Unknown
The ICO procedure is infeasible because it requires knowledge of ρ, the autocorrelation coefficient. In practice, we estimate ρ from the data.
To estimate ρ, we use the OLS residuals (et) as a proxy for the errors (ϵt). The estimate ˆρ is given by:
ˆρ=∑Tt=2etet−1∑Tt=2e2t.
Estimation via OLS:
- Regress the OLS residuals et on their lagged values et−1, without an intercept: et=ρet−1+ut.
- The slope of this regression is the estimate ˆρ.
This estimation is efficient under the AR(1) assumption and provides a practical approximation for ρ.
5.2.2.2.4 Feasible Prais-Winsten
The Feasible Prais-Winsten (FPW) method addresses AR(1) serial correlation in regression models by transforming the data to eliminate serial dependence in the errors. Unlike the Infeasible Cochrane-Orcutt procedure, which discards the first observation, the Prais-Winsten method retains it using a weighted transformation.
The FPW transformation uses the following weighting matrix w:
w=(√1−ˆρ200⋯0−ˆρ10⋯00−ˆρ1⋯0⋮⋮⋮⋱⋮000−ˆρ1).
where
- The first row accounts for the transformation of the first observation, using √1−ˆρ2.
- Subsequent rows represent the AR(1) transformation for the remaining observations.
Step-by-Step Procedure
Step 1: Initial OLS Estimation
Estimate the regression model using OLS:
yt=xtβ+ϵt,
and compute the residuals:
et=yt−xtˆβ.
Step 2: Estimate the AR(1) Correlation Coefficient
Estimate the AR(1) correlation coefficient ρ by regressing et on et−1 without an intercept:
et=ρet−1+ut.
The slope of this regression is the estimated ˆρ.
Step 3: Transform the Data
Apply the transformation using the weighting matrix w to transform both the dependent variable y and the independent variables X:
wy=wXβ+wϵ.
Specifically: 1. For t=1, the transformed dependent and independent variables are: ˜y1=√1−ˆρ2⋅y1,˜x1=√1−ˆρ2⋅x1. 2. For t=2,…,T, the transformed variables are: ˜yt=yt−ˆρ⋅yt−1,˜xt=xt−ˆρ⋅xt−1.
Step 4: Feasible Prais-Winsten Estimation
Run OLS on the transformed equation:
wy=wXβ+wϵ.
The resulting estimator is the Feasible Prais-Winsten (FPW) estimator:
ˆβFPW=(X′w′wX)−1X′w′wy.
Properties of Feasible Prais-Winsten Estimator
- Infeasible Prais-Winsten Estimator:
- Feasible Prais-Winsten (FPW) Estimator: The FPW estimator replaces the unknown ρ with an estimate ˆρ derived from the OLS residuals, introducing bias in small samples.
- Bias:
- The FPW estimator is biased due to the estimation of ˆρ, which introduces an additional layer of approximation.
- Consistency:
- The FPW estimator is consistent under the following assumptions:
- A1: The model is linear in parameters.
- A2: The independent variables are linearly independent.
- A5: The data is generated through random sampling.
- Additionally: E((xt−ρxt−1)′(ϵt−ρϵt−1))=0. This condition ensures the transformed error term ϵt−ρϵt−1 is uncorrelated with the transformed regressors xt−ρxt−1.
- Note: A3a (zero conditional mean of the error term, E(ϵt|xt)=0) is not sufficient for the above condition. Full exogeneity of the independent variables (A3) is required.
- The FPW estimator is consistent under the following assumptions:
- Efficiency
- Asymptotic Efficiency: The FPW estimator is asymptotically more efficient than OLS if the errors are truly generated by an AR(1) process: ϵt=ρϵt−1+ut,Var(ut)=σ2.
- Standard Errors:
- Usual Standard Errors: If the errors are correctly specified as an AR(1) process, the usual standard errors from FPW are valid.
- Robust Standard Errors: If there is concern about a more complex dependence structure (e.g., higher-order autocorrelation or heteroskedasticity), use Newey-West Standard Errors for inference. These are robust to both serial correlation and heteroskedasticity.
- Bias:
5.2.2.3 Cluster Errors
Consider the regression model with clustered errors:
ygi=xgiβ+ϵgi,
where:
g indexes the group (e.g., households, firms, schools).
i indexes the individual within the group.
The covariance structure for the errors ϵgi is defined as:
Cov(ϵgi,ϵhj){=0if g≠h (independent across groups),≠0for any pair (i,j) within group g.
Within each group, individuals’ errors may be correlated (i.e., intra-group correlation), while errors are independent across groups. This violates A4 (constant variance and no correlation of errors).
Suppose there are three groups with varying sizes. The variance-covariance matrix Ω for the errors ϵ is:
Var(ϵ|X)=Ω=(σ2δ112δ113000δ112σ2δ123000δ113δ123σ2000000σ2δ2120000δ212σ2000000σ2).
where
- δgij=Cov(ϵgi,ϵgj) is the covariance between errors for individuals i and j in group g.
- Cov(ϵgi,ϵhj)=0 for g≠h (independent groups).
Infeasible Generalized Least Squares (Cluster)
Assume Known Variance-Covariance Matrix: If σ2 and δgij are known, construct Ω and compute its inverse Ω−1.
Infeasible GLS Estimator: The infeasible generalized least squares (IGLS) estimator is:
ˆβIGLS=(X′Ω−1X)−1X′Ω−1y.
Problem:
- We do not know σ2 and δgij, making this approach infeasible.
- Even if Ω is estimated, incorrect assumptions about its structure may lead to invalid inference.
To make the estimation feasible, we assume a group-level random effects specification for the error:
ygi=xgiβ+cg+ugi,Var(cg|xi)=σ2c,Var(ugi|xi)=σ2u,
where:
cg represents the group-level random effect (common shocks within each group, independent across groups).
ugi represents the individual-level error (idiosyncratic shocks within each group, independent across individuals and groups).
ϵgi=cg+ugi
Independence Assumptions:
- cg and ugi are independent of each other.
- Both are mean-independent of xi.
Under this specification, the variance-covariance matrix Ω becomes block diagonal, where each block corresponds to a group:
Var(ϵ|X)=Ω=(σ2c+σ2uσ2cσ2c000σ2cσ2c+σ2uσ2c000σ2cσ2cσ2c+σ2u000000σ2c+σ2uσ2c0000σ2cσ2c+σ2u000000σ2c+σ2u).
When the variance components σ2c and σ2u are unknown, we can use the Feasible Group-Level Random Effects (RE) estimator to simultaneously estimate these variances and the regression coefficients β. This practical approach allows us to account for intra-group correlation in the errors and still obtain consistent and efficient estimates of the parameters.
Step-by-Step Procedure
Step 1: Initial OLS Estimation
Estimate the regression model using OLS:
ygi=xgiβ+ϵgi,
and compute the residuals:
egi=ygi−xgiˆβ.
Step 2: Estimate Variance Components
Use the standard OLS variance estimator s2 to estimate the total variance:
s2=1n−kn∑i=1e2i,
where n is the total number of observations and k is the number of regressors (including the intercept).
Estimate the between-group variance ˆσ2c using:
ˆσ2c=1GG∑g=1(1∑ng−1i=1i∑i≠jng∑j=1egiegj),
where:
G is the total number of groups,
ng is the size of group g,
The term ∑i≠jegiegj accounts for within-group covariance.
Estimate the within-group variance as:
ˆσ2u=s2−ˆσ2c.
Step 3: Construct the Variance-Covariance Matrix
Use the estimated variances ˆσ2c and ˆσ2u to construct the variance-covariance matrix ˆΩ for the error term:
ˆΩgi,gj={ˆσ2c+ˆσ2uif i=j (diagonal elements),ˆσ2cif i≠j (off-diagonal elements within group),0if g≠h (across groups).
Step 4: Feasible GLS Estimation
With ˆΩ in hand, perform Feasible Generalized Least Squares (FGLS) to estimate β:
ˆβRE=(X′ˆΩ−1X)−1X′ˆΩ−1y.
If the assumptions about Ω are incorrect or infeasible, use cluster-robust standard errors to account for intra-group correlation without explicitly modeling the variance-covariance structure. These standard errors remain valid under arbitrary within-cluster dependence, provided clusters are independent.
Properties of the Feasible Group-Level Random Effects Estimator
- Infeasible Group RE Estimator
- The infeasible RE estimator (assuming known variances) is unbiased under assumptions A1, A2, and A3 for the unweighted equation.
- A3 requires: E(ϵgi|xi)=E(cg|xi)+E(ugi|xi)=0. This assumes:
- E(cg|xi)=0: The random effects assumption (group-level effects are uncorrelated with the regressors).
- E(ugi|xi)=0: No endogeneity at the individual level.
- Feasible Group RE Estimator
- The feasible RE estimator is biased because the variances σ2c and σ2u are estimated, introducing approximation errors.
- However, the estimator is consistent under A1, A2, A3a (E(x′iϵgi)=E(x′icg)+E(x′iugi)=0), A5a.
- Efficiency
- Asymptotic Efficiency:
- The feasible RE estimator is asymptotically more efficient than OLS if the errors follow the random effects specification.
- Standard Errors:
- If the random effects specification is correct, the usual standard errors are consistent.
- If there is concern about more complex dependence structures or heteroskedasticity, use cluster robust standard errors.
- Asymptotic Efficiency:
5.2.3 Weighted Least Squares
In the presence of heteroskedasticity, the errors ϵi have non-constant variance Var(ϵi|xi)=σ2i. This violates the Gauss-Markov assumption of homoskedasticity, leading to inefficient OLS estimates.
Weighted Least Squares (WLS) addresses this by applying weights inversely proportional to the variance of the errors, ensuring that observations with larger variances have less influence on the estimation.
Weighted Least Squares is essentially Generalized Least Squares in the special case that Ω is a diagonal matrix with variances σ2i on the diagonal (i.e., errors are uncorrelated but have non-constant variance).
That is, assume the errors are uncorrelated but heteroskedastic: Ω=diag(σ21,…,σ2n)
Then Ω−1=diag(1/σ21,…,1/σ2n)
Steps for Feasible Weighted Least Squares (FWLS)
1. Initial OLS Estimation
First, estimate the model using OLS:
yi=xiβ+ϵi,
and compute the residuals:
ei=yi−xiˆβ.
2. Model the Error Variance
Transform the residuals to model the variance as a function of the predictors:
ln(e2i)=xiγ+ln(vi),
where:
e2i approximates ϵ2i,
ln(vi) is the error term in this auxiliary regression, assumed independent of xi.
Estimate this equation using OLS to obtain the predicted values:
ˆgi=xiˆγ.
3. Estimate Weights
Use the predicted values from the auxiliary regression to compute the weights:
ˆσi=√exp(ˆgi).
These weights approximate the standard deviation of the errors.
4. Weighted Regression
Transform the original equation by dividing through by ˆσi:
yiˆσi=xiˆσiβ+ϵiˆσi.
Estimate the transformed equation using OLS. The resulting estimator is the Feasible Weighted Least Squares (FWLS) estimator:
ˆβFWLS=(X′ˆWX)−1X′ˆWy,
where ˆW is a diagonal weight matrix with elements 1/ˆσ2i.
Properties of the FWLS Estimator
- Unbiasedness:
- The infeasible WLS estimator (where σi is known) is unbiased under assumptions A1-A3 for the unweighted model.
- The FWLS estimator is not unbiased due to the approximation of σi using ˆσi.
- Consistency:
- The FWLS estimator is consistent under the following assumptions:
- A1 (for the unweighted equation): The model is linear in parameters.
- A2 (for the unweighted equation): The independent variables are linearly independent.
- A5: The data is randomly sampled.
- E(x′iϵi/σ2i)=0. A3a: Weaker Exogeneity Assumption is not sufficient, but A3 is.
- The FWLS estimator is consistent under the following assumptions:
- Efficiency:
- The FWLS estimator is asymptotically more efficient than OLS if the errors have multiplicative exponential heteroskedasticity: Var(ϵi|xi)=σ2i=exp(xiγ).
The FWLS estimator is asymptotically more efficient than OLS if the errors have multiplicative exponential heteroskedasticity.
- Usual Standard Errors:
- If the errors are truly multiplicative exponential heteroskedastic, the usual standard errors for FWLS are valid.
- Heteroskedastic Robust Standard Errors:
- If there is potential mis-specification of the multiplicative exponential model for σ2i, heteroskedastic-robust standard errors should be reported to ensure valid inference.