Chapter 10 Mixed Models for Spatial Effects

When the standard regression model has more than one error term, the model includes both fixed effects and random effects, which is called mixed models. The general form is

$\begin{equation} \mathbf{y}=\mathbf{X}\boldsymbol{\beta}+\mathbf{Z}\boldsymbol{\delta}+\boldsymbol{\varepsilon} \tag{10.1} \end{equation}$

where $\boldsymbol{\delta}$ represents the random effects by assuming $\boldsymbol{\delta}\sim N(\mathbf{0}, \sigma_\delta^2\mathbf{I})$ and $Cov(\boldsymbol{\delta},\boldsymbol{\varepsilon})=\mathbf{0}$ . $\sigma_\delta^2$ is the extra sources of variability in addition to $\sigma^2$ .

Random effects usually are related to categorical variables. $\mathbf{Z}$ is a $n\times q$ indicator matrix. $q$ refers to the levels of factors.

10.1 Fixed and Random Effects

The dichotomy of fixed effects and random effects is not decided by the factors themselves. In research design, the study is interest in some factors and try to estimate corresponding coefficients. These factors are assigned as having fixed effects on the response. When these factors are categorical variables. The levels of the factor are chosen to test the differences among these specific levels. The chosen levels should exhaust the population and the fixed effects across cases are constant. The typical cases include gender, race, and lifecycle.

The factors assigned with random effects usually are not the study interest but need to control. People have already known theses factor have significant impacts on response. These factors contribute a part of variance in the model. By the principle of ANOVA, adding random-effect terms into the model can improve the accuracy of model. However, the study can not or has no intention to estimate their effect sizes.

Sometimes it is because of they have too many levels which can not be exhausted. Or these levels represent comprehensive unknown mechanisms which have no explanatory value. For example, in experimental design, a few operators or machines are selected at random from numerous operators and machines but people have no interest in their influences. The experiment will assign operators and machines with random effects.

Sometimes it is just because the research want to get a generalized result. In travel-urban form studies, the number of urban areas is finite. If the research purpose is to compare the travel behavior among the five cities in U.S. WestCoast, the effect of cities should be treated as fixed. If the research want to evaluate the impact of urban density across cities, the appropriate setting is to assign the factor of cities with random effects. Due to the cities in data are not random selected, the estimates could be biased. Although the result has limitation but is still looked as a research contribution.

10.2 Crossed and Nested Effects

Crossed effects versus nested effects is another dichotomy in regression models. They are often related to mixed models. Crossed effects means that every levels of factor $A$ co-occurs with every levels of factor $B$ . There could be interaction effect between $A$ and $B$ . There is at least one observation in any specific combination of categories. A level of factor $A$ applied on the cases will refer to the same treatment. For example, a household with/without child and with/without vehicle have crossed effects. all households can be assigned to the four categories. All households in one category have the same characteristics on parenthood and vehicle ownership.

Nested arrangement is also called hierarchical design. The levels of factor $A$ nested under the levels of factor $B$ . That means some levels of $A$ only occurs with one level of factor $B$ . In other words, all levels of $A$ nested under of $B$ are unique. There would not be interaction effect between $A$ and $B$ . Some combinations of categories are not represented. The spatial related factors are often nested arrangement. For example, the built environment of neighborhood $A$ has some impacts on household living inside this area. Neighborhood $A$ belongs to city $B$ , which also has some impacts on all neighborhoods inside this city. The neighborhood $A$ is nested under city $B$ and $A$ has no connection with any households living in other cities.

It should be careful that the nested effects are not obvious in some cases. For example, population density should be a crossed factor because a density value (e.g. 1000 persons per square miles) is exactly the same in any cities. However, it is possible that a city such as New York only has high-density neighborhoods (e.g all are greater than 1000 persons/sq.mi.), while another city only has low-density neighborhoods (e.g all are less than 1000 persons/sq.mi.). In this case, the effects of density with respect to city are not crossed.

10.3 Unbalanced Subgroup

In mixed models, ordinary least squares method ignore the impact of the random effects. When the grouped data is balanced, the generalized least square method is equivalent to ordinary method. The sum of square error is

$SSE=\mathbf{y'[Z(Z'Z)^{-1}Z'-X(X'X)^{-1}X']y}$

It is common that the sample sizes in every groups are unequal. Restricted Maximum Likelihood (REML) is an iterative approach which can deal with the variability among the groups. Each iteration of REML involves two steps. The first step ignores the random effect. Only fixed effects $\boldsymbol{\hat\beta}^{(t)}$ are estimated using ordinary least squares method. Then, the results can construct a set of estimated residuals $\mathbf{\hat e}^{(t)}$ .

In the second step, using maximum likelihood method to obtain the estimated variance components $(\hat\sigma^2_\delta)^{(t)}$ and the updated variance of $y$ .

Plugging the new variance back to step one can get the updated fixed effects and residuals. This procedure continues until the results being convergence.

Recall that maximum likelihood estimates assumes the observations follow a normal distribution. Therefore, the mixed model with REML method has stricter requirement of the data or proper transformations.