5.2 Generalized Least Squares

5.2.1 Infeasible Generalized Least Squares

Motivation for a More Efficient Estimator

  • The Gauss-Markov Theorem guarantees that OLS is the Best Linear Unbiased Estimator (BLUE) under assumptions A1-A4:
    • A4: Var(ϵ|X)=σ2In (homoscedasticity and no autocorrelation).
  • When A4 does not hold:
    • Heteroskedasticity: Var(ϵi|X)σ2.
    • Serial Correlation: Cov(ϵi,ϵj|X)0 for (ij).

Without A4, OLS is unbiased but no longer efficient. This motivates the need for an alternative approach to identify the most efficient estimator.

The unweighted (standard) regression model is given by:

y=Xβ+ϵ

Assuming A1-A3 hold (linearity, full rank, exogeneity), but A4 does not, the variance of the error term is no longer proportional to an identity

Var(ϵ|X)=Ωσ2In.

To address the violation of A4 (Ωσ2In), one can transform the model by premultiplying both sides by a full-rank matrix w to have a weighted (transformed) regression model:

wy=wXβ+wϵ,

where w is a full-rank matrix chosen such that:

ww=Ω1.

  • w is the Cholesky Decomposition of Ω1.
  • The Cholesky decomposition ensures w satisfies ww=Ω1, where w is the “square root” of Ω1 in matrix terms.

By transforming the original model, the variance of the transformed errors becomes:

Ω=Var(ϵ|X),Ω1=Var(ϵ|X)1.

The transformed equation allows us to compute a more efficient estimator.

Using the transformed model, the Infeasible Generalized Least Squares (IGLS) estimator is:

ˆβIGLS=(XwwX)1Xwwy=(XΩ1X)1XΩ1y=β+(XΩ1X)1XΩ1ϵ.

  1. Unbiasedness

Since assumptions A1-A3 hold for the unweighted model:

E(ˆβIGLS|X)=E(β+(XΩ1XΩ1ϵ)|X)=β+E(XΩ1XΩ1ϵ|X)=β+XΩ1XΩ1E(ϵ|X)since A3: E(ϵ|X)=0,=β.

Thus, the IGLS estimator is unbiased.

  1. Variance

The variance of the transformed errors is given by:

Var(wϵ|X)=wVar(ϵ|X)w=wΩw=w(ww)1wsince w is full-rank,=ww1(w)1w=In.

Hence, A4 holds for the transformed (weighted) equation, satisfying the Gauss-Markov conditions.

The variance of the IGLS estimator is:

Var(ˆβIGLS|X)=Var(β+(XΩ1X)1XΩ1ϵ|X)=Var((XΩ1X)1XΩ1ϵ|X)=(XΩ1X)1XΩ1Var(ϵ|X)Ω1X(XΩ1X)1because A4 holds,=(XΩ1X)1XΩ1ΩΩ1Ω1X(XΩ1X)1,=(XΩ1X)1.

  1. Efficiency

The difference in variances between OLS and IGLS is:

Var(ˆβOLS|X)Var(ˆβIGLS|X)=AΩA,

where:

A=(XX)1X(XΩ1X)1XΩ1.

Since Ω is positive semi-definite, AΩA is also positive semi-definite. Thus, the IGLS estimator is more efficient than OLS under heteroskedasticity or autocorrelation.

In short, properties of ˆβIGLS:

  1. Unbiasedness: ˆβIGLS remains unbiased as long as A1-A3 hold.
  2. Efficiency: ˆβIGLS is more efficient than OLS under heteroskedasticity or serial correlation since it accounts for the structure of Ω.

Why Is IGLS “Infeasible”?

The name infeasible arises because it is generally impossible to compute the estimator directly due to the structure of w (or equivalently Ω1). The matrix w is defined as:

w=(w11000w21w2200w31w32w33wn1wn2wn3wnn),

with n(n+1)/2 unique elements for n observations. This results in more parameters than data points, making direct estimation infeasible.

To make the estimation feasible, assumptions about the structure of Ω are required. Common approaches include:

  • Heteroskedasticity Errors: Assume a multiplicative exponential model for the variance, such as Var(ϵi|X)=σ2i.

    • Assume no correlation between errors, but allow heterogeneous variances: Ω=(σ21000σ22000σ2n).

    • Estimate σ2i using methods such as:

      • Modeling σ2i as a function of predictors (e.g., σ2i=exp(xiγ)).
  • Serial Correlation: Assume serial correlation follows an autoregressive process AR(1) Model, e.g., ϵt=ρϵt1+ut and Cov(ϵt,ϵth)=ρhσ2, where we have a variance-covariance matrix with off-diagonal elements decaying geometrically: Ω=σ21ρ2(1ρρ2ρn1ρ1ρρn2ρ2ρ1ρn3ρn1ρn2ρn31).

  • Cluster Errors: Assume block-diagonal structure for Ω to account for grouped or panel data.

Each assumption simplifies the estimation of Ω and thus w, enabling Feasible Generalized Least Squares with fewer unknown parameters to estimate.

5.2.2 Feasible Generalized Least Squares

5.2.2.1 Heteroskedasticity Errors

Heteroskedasticity occurs when the variance of the error term is not constant across observations. Specifically:

Var(ϵi|xi)=E(ϵ2i|xi)σ2,

but instead depends on a function of xi:

Var(ϵi|xi)=h(xi)=σ2i

This violates the assumption of homoscedasticity (constant variance), impacting the efficiency of OLS estimates.

For the model:

yi=xiβ+ϵi,

we apply a transformation to standardize the variance:

yiσi=xiσiβ+ϵiσi.

By scaling each observation with 1/σi, the variance of the transformed error term becomes:

Var(ϵiσi|X)=1σ2iVar(ϵi|X)=1σ2iσ2i=1.

Thus, the heteroskedasticity is corrected in the transformed model.

In matrix notation, the transformed model is:

wy=wXβ+wϵ,

where w is the weight matrix used to standardize the variance. The weight matrix w is defined as:

w=(1/σ100001/σ200001/σ300001/σn).


In the presence of heteroskedasticity, the variance of the error term, Var(ϵi|xi), is not constant across observations. This leads to inefficient OLS estimates.

Infeasible Weighted Least Squares (IWLS) assumes that the variances σ2i=Var(ϵi|xi) are known. This allows us to adjust the regression equation to correct for heteroskedasticity.

The model is transformed as follows:

yi=xiβ+ϵi(original equation),

where ϵi has variance σ2i. To make the errors homoskedastic, we divide through by σi:

yiσi=xiσiβ+ϵiσi.

Now, the transformed error term ϵi/σi has a constant variance of 1:

Var(ϵiσi|xi)=1.


The IWLS estimator minimizes the weighted sum of squared residuals for the transformed model:

Minimize: ni=1(yixiβσi)2.

In matrix form, the IWLS estimator is:

ˆβIWLS=(XWX)1XWy,

where W is a diagonal matrix of weights:

W=(1/σ210001/σ220001/σ2n).


Properties of IWLS

  1. Valid Standard Errors:
    • If Var(ϵi|X)=σ2i, the usual standard errors from IWLS are valid.
  2. Robustness:
    • If the variance assumption is incorrect (Var(ϵi|X)σ2i), heteroskedasticity-robust standard errors must be used instead.

The primary issue with IWLS is that σ2i=Var(ϵi|xi) is generally unknown. Specifically, we do not know:

σ2i=Var(ϵi|xi)=E(ϵ2i|xi).

The challenges are:

  1. Single Observation:
    • For each observation i, there is only one ϵi, which is insufficient to estimate the variance σ2i directly.
  2. Dependence on Assumptions:
    • To estimate σ2i, we must impose assumptions about its relationship to xi.

To make IWLS feasible, we model σ2i as a function of the predictors xi. A common approach is:

ϵ2i=viexp(xiγ),

where:

  • vi is an independent error term with strictly positive values, representing random noise.

  • exp(xiγ) is a deterministic function of the predictors xi.

Taking the natural logarithm of both sides linearizes the model:

ln(ϵ2i)=xiγ+ln(vi),

where ln(vi) is independent of xi. This transformation enables us to estimate γ using standard OLS techniques.


Estimation Procedure for Feasible GLS (FGLS)

Since we do not observe the true errors ϵi, we approximate them using the OLS residuals ei. Here’s the step-by-step process:

  1. Compute OLS Residuals: First, fit the original model using OLS and calculate the residuals:

    ei=yixiˆβOLS.

  2. Approximate ϵ2i with e2i: Use the squared residuals as a proxy for the squared errors:

    e2iϵ2i.

  3. Log-Linear Model: Fit the log-transformed model to estimate γ:

    ln(e2i)=xiγ+ln(vi).

    Estimate γ using OLS, where ln(vi) is treated as the error term.

  4. Estimate Variances: Use the fitted values ˆγ to estimate σ2i for each observation:

    ˆσ2i=exp(xiˆγ).

  5. Perform Weighted Least Squares: Use the estimated variances ˆσ2i to construct the weight matrix ˆW:

    ˆW=(1/ˆσ210001/ˆσ220001/ˆσ2n).

    Then, compute the Feasible GLS (FGLS) estimator:

    ˆβFGLS=(XˆWX)1XˆWy.

5.2.2.2 Serial Correlation

Serial correlation (also called autocorrelation) occurs when the error terms in a regression model are correlated across observations. Formally:

Cov(ϵi,ϵj|X)0for ij.

This violates the Gauss-Markov assumption that Cov(ϵi,ϵj|X)=0, leading to inefficiencies in OLS estimates.


5.2.2.2.1 Covariance Stationarity

If the errors are covariance stationary, the covariance between errors depends only on their relative time or positional difference (h), not their absolute position:

Cov(ϵi,ϵj|X)=Cov(ϵi,ϵi+h|xi,xi+h)=γh,

where γh represents the covariance at lag h.

Under covariance stationarity, the variance-covariance matrix of the error term ϵ takes the following form:

Var(ϵ|X)=Ω=(σ2γ1γ2γn1γ1σ2γ1γn2γ2γ1σ2γ1γn1γn2γ1σ2).

Key Points:

  1. The diagonal elements represent the variance of the error term: σ2.
  2. The off-diagonal elements γh represent covariances at different lags h.

Why Serial Correlation Is a Problem?

The matrix Ω introduces n parameters to estimate (e.g., σ2,γ1,γ2,,γn1). Estimating such a large number of parameters becomes impractical, especially for large datasets. To address this, we impose additional structure to reduce the number of parameters.


5.2.2.2.2 AR(1) Model

In the AR(1) process, the errors follow a first-order autoregressive process:

yt=β0+xtβ1+ϵt,ϵt=ρϵt1+ut,

where:

  • ρ is the first-order autocorrelation coefficient, capturing the relationship between consecutive errors.

  • ut is white noise, satisfying Var(ut)=σ2u and Cov(ut,uth)=0 for h0.

Under the AR(1) assumption, the variance-covariance matrix of the error term ϵ becomes:

Var(ϵ|X)=σ2u1ρ2(1ρρ2ρn1ρ1ρρn2ρ2ρ1ρn3ρn1ρn2ρ1).

Key Features:

  1. The diagonal elements represent the variance: Var(ϵt|X)=σ2u/(1ρ2).
  2. The off-diagonal elements decay exponentially with lag h: Cov(ϵt,ϵth|X)=ρhVar(ϵt|X).

Under AR(1), only one parameter ρ needs to be estimated (in addition to σ2u), greatly simplifying the structure of Ω.


OLS Properties Under AR(1)

  1. Consistency: If assumptions A1, A2, A3a, and A5a hold, OLS remains consistent.
  2. Asymptotic Normality: OLS estimates are asymptotically normal.
  3. Inference with Serial Correlation:
    • Standard OLS errors are invalid.
    • Use Newey-West standard errors to obtain robust inference.

5.2.2.2.3 Infeasible Cochrane-Orcutt

The Infeasible Cochrane-Orcutt procedure addresses serial correlation in the error terms by assuming an AR(1) process for the errors:

ϵt=ρϵt1+ut,

where ut is white noise and ρ is the autocorrelation coefficient.

By transforming the original regression equation:

yt=β0+xtβ1+ϵt,

we subtract ρ times the lagged equation:

ρyt1=ρ(β0+xt1β1+ϵt1),

to obtain the weighted first-difference equation:

ytρyt1=(1ρ)β0+(xtρxt1)β1+ut.

Key Points:

  1. Dependent Variable: ytρyt1.
  2. Independent Variable: xtρxt1.
  3. Error Term: ut, which satisfies the Gauss-Markov assumptions (A3, A4, A5).

The ICO estimator minimizes the sum of squared residuals for this transformed equation.

  1. Standard Errors:
    • If the errors truly follow an AR(1) process, the standard errors for the transformed equation are valid.
    • For more complex error structures, Newey-West HAC standard errors are required.
  2. Loss of Observations:
    • The transformation involves first differences, which means the first observation (y1) cannot be used. This reduces the effective sample size by one.

The Problem: ρ Is Unknown

The ICO procedure is infeasible because it requires knowledge of ρ, the autocorrelation coefficient. In practice, we estimate ρ from the data.

To estimate ρ, we use the OLS residuals (et) as a proxy for the errors (ϵt). The estimate ˆρ is given by:

ˆρ=Tt=2etet1Tt=2e2t.

Estimation via OLS:

  1. Regress the OLS residuals et on their lagged values et1, without an intercept: et=ρet1+ut.
  2. The slope of this regression is the estimate ˆρ.

This estimation is efficient under the AR(1) assumption and provides a practical approximation for ρ.


5.2.2.2.4 Feasible Prais-Winsten

The Feasible Prais-Winsten (FPW) method addresses AR(1) serial correlation in regression models by transforming the data to eliminate serial dependence in the errors. Unlike the Infeasible Cochrane-Orcutt procedure, which discards the first observation, the Prais-Winsten method retains it using a weighted transformation.

The FPW transformation uses the following weighting matrix w:

w=(1ˆρ2000ˆρ1000ˆρ10000ˆρ1).

where

  • The first row accounts for the transformation of the first observation, using 1ˆρ2.
  • Subsequent rows represent the AR(1) transformation for the remaining observations.

Step-by-Step Procedure

Step 1: Initial OLS Estimation

Estimate the regression model using OLS:

yt=xtβ+ϵt,

and compute the residuals:

et=ytxtˆβ.


Step 2: Estimate the AR(1) Correlation Coefficient

Estimate the AR(1) correlation coefficient ρ by regressing et on et1 without an intercept:

et=ρet1+ut.

The slope of this regression is the estimated ˆρ.


Step 3: Transform the Data

Apply the transformation using the weighting matrix w to transform both the dependent variable y and the independent variables X:

wy=wXβ+wϵ.

Specifically: 1. For t=1, the transformed dependent and independent variables are: ˜y1=1ˆρ2y1,˜x1=1ˆρ2x1. 2. For t=2,,T, the transformed variables are: ˜yt=ytˆρyt1,˜xt=xtˆρxt1.


Step 4: Feasible Prais-Winsten Estimation

Run OLS on the transformed equation:

wy=wXβ+wϵ.

The resulting estimator is the Feasible Prais-Winsten (FPW) estimator:

ˆβFPW=(XwwX)1Xwwy.


Properties of Feasible Prais-Winsten Estimator

  1. Infeasible Prais-Winsten Estimator:
    • The infeasible Prais-Winsten (PW) estimator assumes the AR(1) parameter ρ is known.
    • Under assumptions A1, A2, and A3 for the unweighted equation, the infeasible PW estimator is unbiased and efficient.
  2. Feasible Prais-Winsten (FPW) Estimator: The FPW estimator replaces the unknown ρ with an estimate ˆρ derived from the OLS residuals, introducing bias in small samples.
    1. Bias:
      • The FPW estimator is biased due to the estimation of ˆρ, which introduces an additional layer of approximation.
    2. Consistency:
      1. The FPW estimator is consistent under the following assumptions:
        • A1: The model is linear in parameters.
        • A2: The independent variables are linearly independent.
        • A5: The data is generated through random sampling.
        • Additionally: E((xtρxt1)(ϵtρϵt1))=0. This condition ensures the transformed error term ϵtρϵt1 is uncorrelated with the transformed regressors xtρxt1.
      2. Note: A3a (zero conditional mean of the error term, E(ϵt|xt)=0) is not sufficient for the above condition. Full exogeneity of the independent variables (A3) is required.
    3. Efficiency
      1. Asymptotic Efficiency: The FPW estimator is asymptotically more efficient than OLS if the errors are truly generated by an AR(1) process: ϵt=ρϵt1+ut,Var(ut)=σ2.
      2. Standard Errors:
        1. Usual Standard Errors: If the errors are correctly specified as an AR(1) process, the usual standard errors from FPW are valid.
        2. Robust Standard Errors: If there is concern about a more complex dependence structure (e.g., higher-order autocorrelation or heteroskedasticity), use Newey-West Standard Errors for inference. These are robust to both serial correlation and heteroskedasticity.

5.2.2.3 Cluster Errors

Consider the regression model with clustered errors:

ygi=xgiβ+ϵgi,

where:

  • g indexes the group (e.g., households, firms, schools).

  • i indexes the individual within the group.

The covariance structure for the errors ϵgi is defined as:

Cov(ϵgi,ϵhj){=0if gh (independent across groups),0for any pair (i,j) within group g.

Within each group, individuals’ errors may be correlated (i.e., intra-group correlation), while errors are independent across groups. This violates A4 (constant variance and no correlation of errors).


Suppose there are three groups with varying sizes. The variance-covariance matrix Ω for the errors ϵ is:

Var(ϵ|X)=Ω=(σ2δ112δ113000δ112σ2δ123000δ113δ123σ2000000σ2δ2120000δ212σ2000000σ2).

where

  • δgij=Cov(ϵgi,ϵgj) is the covariance between errors for individuals i and j in group g.
  • Cov(ϵgi,ϵhj)=0 for gh (independent groups).

Infeasible Generalized Least Squares (Cluster)

  1. Assume Known Variance-Covariance Matrix: If σ2 and δgij are known, construct Ω and compute its inverse Ω1.

  2. Infeasible GLS Estimator: The infeasible generalized least squares (IGLS) estimator is:

    ˆβIGLS=(XΩ1X)1XΩ1y.

Problem:

  • We do not know σ2 and δgij, making this approach infeasible.
  • Even if Ω is estimated, incorrect assumptions about its structure may lead to invalid inference.

To make the estimation feasible, we assume a group-level random effects specification for the error:

ygi=xgiβ+cg+ugi,Var(cg|xi)=σ2c,Var(ugi|xi)=σ2u,

where:

  • cg represents the group-level random effect (common shocks within each group, independent across groups).

  • ugi represents the individual-level error (idiosyncratic shocks within each group, independent across individuals and groups).

  • ϵgi=cg+ugi

Independence Assumptions:

  • cg and ugi are independent of each other.
  • Both are mean-independent of xi.

Under this specification, the variance-covariance matrix Ω becomes block diagonal, where each block corresponds to a group:

Var(ϵ|X)=Ω=(σ2c+σ2uσ2cσ2c000σ2cσ2c+σ2uσ2c000σ2cσ2cσ2c+σ2u000000σ2c+σ2uσ2c0000σ2cσ2c+σ2u000000σ2c+σ2u).

When the variance components σ2c and σ2u are unknown, we can use the Feasible Group-Level Random Effects (RE) estimator to simultaneously estimate these variances and the regression coefficients β. This practical approach allows us to account for intra-group correlation in the errors and still obtain consistent and efficient estimates of the parameters.


Step-by-Step Procedure

Step 1: Initial OLS Estimation

Estimate the regression model using OLS:

ygi=xgiβ+ϵgi,

and compute the residuals:

egi=ygixgiˆβ.


Step 2: Estimate Variance Components

Use the standard OLS variance estimator s2 to estimate the total variance:

s2=1nkni=1e2i,

where n is the total number of observations and k is the number of regressors (including the intercept).

Estimate the between-group variance ˆσ2c using:

ˆσ2c=1GGg=1(1ng1i=1iijngj=1egiegj),

where:

  • G is the total number of groups,

  • ng is the size of group g,

  • The term ijegiegj accounts for within-group covariance.

Estimate the within-group variance as:

ˆσ2u=s2ˆσ2c.


Step 3: Construct the Variance-Covariance Matrix

Use the estimated variances ˆσ2c and ˆσ2u to construct the variance-covariance matrix ˆΩ for the error term:

ˆΩgi,gj={ˆσ2c+ˆσ2uif i=j (diagonal elements),ˆσ2cif ij (off-diagonal elements within group),0if gh (across groups).


Step 4: Feasible GLS Estimation

With ˆΩ in hand, perform Feasible Generalized Least Squares (FGLS) to estimate β:

ˆβRE=(XˆΩ1X)1XˆΩ1y.

If the assumptions about Ω are incorrect or infeasible, use cluster-robust standard errors to account for intra-group correlation without explicitly modeling the variance-covariance structure. These standard errors remain valid under arbitrary within-cluster dependence, provided clusters are independent.


Properties of the Feasible Group-Level Random Effects Estimator

  1. Infeasible Group RE Estimator
  • The infeasible RE estimator (assuming known variances) is unbiased under assumptions A1, A2, and A3 for the unweighted equation.
  • A3 requires: E(ϵgi|xi)=E(cg|xi)+E(ugi|xi)=0. This assumes:
    • E(cg|xi)=0: The random effects assumption (group-level effects are uncorrelated with the regressors).
    • E(ugi|xi)=0: No endogeneity at the individual level.
  1. Feasible Group RE Estimator
  • The feasible RE estimator is biased because the variances σ2c and σ2u are estimated, introducing approximation errors.
  • However, the estimator is consistent under A1, A2, A3a (E(xiϵgi)=E(xicg)+E(xiugi)=0), A5a.
  • Efficiency
    • Asymptotic Efficiency:
      • The feasible RE estimator is asymptotically more efficient than OLS if the errors follow the random effects specification.
    • Standard Errors:
      • If the random effects specification is correct, the usual standard errors are consistent.
      • If there is concern about more complex dependence structures or heteroskedasticity, use cluster robust standard errors.

5.2.3 Weighted Least Squares

In the presence of heteroskedasticity, the errors ϵi have non-constant variance Var(ϵi|xi)=σ2i. This violates the Gauss-Markov assumption of homoskedasticity, leading to inefficient OLS estimates.

Weighted Least Squares (WLS) addresses this by applying weights inversely proportional to the variance of the errors, ensuring that observations with larger variances have less influence on the estimation.

  • Weighted Least Squares is essentially Generalized Least Squares in the special case that Ω is a diagonal matrix with variances σ2i on the diagonal (i.e., errors are uncorrelated but have non-constant variance).

    • That is, assume the errors are uncorrelated but heteroskedastic: Ω=diag(σ21,,σ2n)

    • Then Ω1=diag(1/σ21,,1/σ2n)

Steps for Feasible Weighted Least Squares (FWLS)

1. Initial OLS Estimation

First, estimate the model using OLS:

yi=xiβ+ϵi,

and compute the residuals:

ei=yixiˆβ.

2. Model the Error Variance

Transform the residuals to model the variance as a function of the predictors:

ln(e2i)=xiγ+ln(vi),

where:

  • e2i approximates ϵ2i,

  • ln(vi) is the error term in this auxiliary regression, assumed independent of xi.

Estimate this equation using OLS to obtain the predicted values:

ˆgi=xiˆγ.

3. Estimate Weights

Use the predicted values from the auxiliary regression to compute the weights:

ˆσi=exp(ˆgi).

These weights approximate the standard deviation of the errors.

4. Weighted Regression

Transform the original equation by dividing through by ˆσi:

yiˆσi=xiˆσiβ+ϵiˆσi.

Estimate the transformed equation using OLS. The resulting estimator is the Feasible Weighted Least Squares (FWLS) estimator:

ˆβFWLS=(XˆWX)1XˆWy,

where ˆW is a diagonal weight matrix with elements 1/ˆσ2i.


Properties of the FWLS Estimator

  1. Unbiasedness:
    • The infeasible WLS estimator (where σi is known) is unbiased under assumptions A1-A3 for the unweighted model.
    • The FWLS estimator is not unbiased due to the approximation of σi using ˆσi.
  2. Consistency:
    • The FWLS estimator is consistent under the following assumptions:
      • A1 (for the unweighted equation): The model is linear in parameters.
      • A2 (for the unweighted equation): The independent variables are linearly independent.
      • A5: The data is randomly sampled.
      • E(xiϵi/σ2i)=0. A3a: Weaker Exogeneity Assumption is not sufficient, but A3 is.
  3. Efficiency:
    • The FWLS estimator is asymptotically more efficient than OLS if the errors have multiplicative exponential heteroskedasticity: Var(ϵi|xi)=σ2i=exp(xiγ).

The FWLS estimator is asymptotically more efficient than OLS if the errors have multiplicative exponential heteroskedasticity.

  1. Usual Standard Errors:
    • If the errors are truly multiplicative exponential heteroskedastic, the usual standard errors for FWLS are valid.
  2. Heteroskedastic Robust Standard Errors:
    • If there is potential mis-specification of the multiplicative exponential model for σ2i, heteroskedastic-robust standard errors should be reported to ensure valid inference.