Chapter 9 Linear Mixed Models
Sources of variability, i.e. noise, are known in the statistical literature as “random effects”. Specifying these sources determines the correlation structure in our measurements. In the simplest linear models of Chapter 7, we thought of the variability as originating from measurement error, thus independent of anything else. No-correlation, and fixed variability is known as sphericity. Sphericity is of great mathematical convenience, but quite often, unrealistic.
The effects we want to infer on are assumingly non-random, and known “fixed-effects”. Sources of variability in our measurements, known as “random-effects” are usually not the object of interest. A model which has both random-effects, and fixed-effects, is known as a “mixed effects” model. If the model is also linear, it is known as a linear mixed model (LMM). Here are some examples where LMMs arise.
Alternatively, we could consider between-machine variability as another source of uncertainty when inferring on the temporal fixed effect. In which case, would not subtract the machine-effect, bur rather, treat it as a random-effect, in the LMM framework.
The unifying theme of the above examples, is that the variability in our data has several sources. Which are the sources of variability that need to concern us? This is a delicate matter which depends on your goals. As a rule of thumb, we will suggest the following view: If information of an effect will be available at the time of prediction, treat it as a fixed effect. If it is not, treat it as a random-effect.
LMMs are so fundamental, that they have earned many names:
Mixed Effects: Because we may have both fixed effects we want to estimate and remove, and random effects which contribute to the variability to infer against.
Variance Components: Because as the examples show, variance has more than a single source (like in the Linear Models of Chapter 7).
Hierarchical Models: Because as Example 9.4 demonstrates, we can think of the sampling as hierarchical– first sample a subject, and then sample its response.
Multilevel Analysis: For the same reasons it is also known as Hierarchical Models.
Repeated Measures: Because we make several measurements from each unit, like in Example 9.4.
Longitudinal Data: Because we follow units over time, like in Example 9.4.
Panel Data: Is the term typically used in econometric for such longitudinal data.
Whether we are aiming to infer on a generative model’s parameters, or to make predictions, there is no “right” nor “wrong” approach. Instead, there is always some implied measure of error, and an algorithm may be good, or bad, with respect to this measure (think of false and true positives, for instance). This is why we care about dependencies in the data: ignoring the dependence structure will probably yield inefficient algorithms. Put differently, if we ignore the statistical dependence in the data we will probably me making more errors than possible/optimal.
We now emphasize:
Like in previous chapters, by “model” we refer to the assumed generative distribution, i.e., the sampling distribution.
In a LMM we specify the dependence structure via the hierarchy in the sampling scheme E.g. caps within machine, students within class, etc. Not all dependency models can be specified in this way! Dependency structures that are not hierarchical include temporal dependencies (AR, ARIMA, ARCH and GARCH), spatial, Markov Chains, and more. To specify dependency structures that are no hierarchical, see Chapter 8 in (the excellent) Weiss (2005).
If you are using LMMs for predictions, and not for inference on the fixed effects or variance components, then see the Supervised Learning Chapter 10. Also recall that machine learning from non-independent observations (such as LMMs) is a delicate matter.
9.1 Problem Setup
We denote an outcome with \(y\) and assume its sampling distribution is given by (the generative process) \[\begin{align} y|x,u = x'\beta + z'u + \varepsilon \tag{9.1} \end{align}\] where \(x\) are the factors with (fixed) effects we want to study, and\(\beta\) denotes these effects. The factors \(z\), with effects \(u\), merely contribute to variability in \(y|x\).
In matrix notation:
\[ y = X \beta + Zu + \epsilon,\] where:
here \(y\) is a vector of observations (not scalar as above), with mean \(E(y) = X \beta\); \(\beta\) is unknown vector of fixed effects; \(u\) is an unknown vector of random effects, with mean \(E(u)=0\), and covariance matrix \(var(u) = G\); \(\epsilon\) is an unknown vector (again, not scalar) of random errors, with mean \(E(\epsilon)=0\) and variance \(var(\epsilon)=R\) (usually \(R = \sigma^2 I_{N \times N}\)); \(X\) and \(Z\) are known design matrices. See this example for better understanding the matrix notation.
In LMM we typically assume: \(u \sim \mathcal{N}(0,G);\ \epsilon \sim \mathcal(0,R); \ Cov(u,\epsilon)=0\). Note: the random effects are assumed to be sampled from a multivariate Gaussian distribution \(\mathcal{N}(0,G)\). The fixed and random effects \(\beta\) and \(u\) are given by maximizing the joint density \(f(y,u)\).
In our repeated measures example (9.2) the treatment is a fixed effect, and the subject is a random effect. In our bottle-caps example (9.3) the time (before vs. after) is a fixed effect, and the machines may be either a fixed or a random effect (depending on the purpose of inference). In our diet example (9.4) the diet is the fixed effect and the subject is a random effect.
Notice that we state \(y|x,z\) merely as a convenient way to do inference on \(y|x\). We could, instead, specify \(Var[y|x]\) directly. The second approach seems less convenient. This is the power of LMMs! We specify the covariance not via the matrix \(Var[z'u|x]\), or \(Var[y|x]\), but rather via the sampling hierarchy.
Given a sample of \(n\) observations \((y_i,x_i,z_i)\) from model (9.1), we will want to estimate \((\beta,u)\). Under the assumption on the distribution of \(\varepsilon\) and \(z\) mentioned above, we can use maximum likelihood (ML). In the context of LMMs, however, ML is typically replaced with restricted maximum likelihood (ReML), because it returns unbiased estimates of \(Var[y|x]\) and ML does not.
9.1.1 Non-Linear Mixed Models
The idea of random-effects can also be extended to non-linear mean models. Formally, this means that \(y|x,z=f(x,z,\varepsilon)\) for some non-linear \(f\). This is known as non-linear-mixed-models, which will not be discussed in this text.
9.1.2 Generalized Linear Mixed Models (GLMM)
You can marry the ideas of random effects, with non-linear link functions, and non-Gaussian distribution of the response. These are known as Generalized Linear Mixed Models (GLMM), which will not be discussed in this text.
9.2 LMMs in R
We will fit LMMs with the lme4::lmer
function.
The lme4 is an excellent package, written by the mixed-models Guru Douglas Bates.
We start with a small simulation demonstrating the importance of acknowledging your sources of variability.
Our demonstration consists of fitting a linear model that assumes independence, when data is clearly dependent.
n.groups <- 4 # number of groups
n.repeats <- 2 # samples per group
groups <- rep(1:n.groups, each=n.repeats) %>% as.factor
n <- length(groups)
z0 <- rnorm(n.groups, 0, 10)
(z <- z0[as.numeric(groups)]) # generate and inspect random group effects
## [1] 8.242268 8.242268 -13.706530 -13.706530 10.091668 10.091668
## [7] -2.141764 -2.141764
epsilon <- rnorm(n,0,1) # generate measurement error
beta0 <- 2 # this is the actual parameter of interest! The global mean.
y <- beta0 + z + epsilon # sample from an LMM
We can now fit the linear and LMM.
# fit a linear model assuming independence
lm.5 <- lm(y~1)
# fit a mixed-model that deals with the group dependence
library(lme4)
lme.5 <- lmer(y~1|groups)
The summary of the linear model
##
## Call:
## lm(formula = y ~ 1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.949 -7.275 1.629 8.668 10.005
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.317 3.500 0.948 0.375
##
## Residual standard error: 9.898 on 7 degrees of freedom
The summary of the LMM
## Linear mixed model fit by REML ['lmerMod']
## Formula: y ~ 1 | groups
##
## REML criterion at convergence: 41
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.15395 -0.50048 0.04306 0.55891 0.99797
##
## Random effects:
## Groups Name Variance Std.Dev.
## groups (Intercept) 111.962 10.581
## Residual 2.012 1.418
## Number of obs: 8, groups: groups, 4
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 3.317 5.314 0.624
Look at the standard error of the global mean, i.e., the intercept:
for lm
it is 3.5, and for lme
it is 5.314.
Why this difference?
Because lm
treats the group effect as fixed, while the mixed model treats the group effect as a source of noise/uncertainty.
Inference using lm
underestimates our uncertainty in the estimated population mean (\(\beta_0\)).
This is that false-sense of security we may have when ignoring correlations.
9.2.0.1 Relation to Paired t-test
Recall the paired t-test.
Our two-sample–per-group example of the LMM is awfully similar to a paired t-test.
It would be quite troubling if the well-known t-test and the oh-so-powerful LMM would lead to diverging conclusions.
In the previous, we inferred on the global mean; a quantity that cancels out when pairing. For a fair comparison, let’s infer on some temporal effect.
Compare the t-statistic below, to the t value
in the summary of lme.6
.
Luckily, as we demonstrate, the paired t-test and the LMM are equivalent.
So if you follow authors like (???) that recommend LMMs instead of pairing, remember, these things are sometimes equivalent.
time.fixed.effect <- rep(c('Before','After'), times=4) %>% factor
head(cbind(y,groups,time.fixed.effect))
## y groups time.fixed.effect
## [1,] 10.76759 1 2
## [2,] 10.36614 1 1
## [3,] -11.57154 2 2
## [4,] -10.24338 2 1
## [5,] 13.18838 3 2
## [6,] 13.04653 3 1
## Estimate Std. Error t value
## (Intercept) 3.31805208 5.4773030 0.6057821
## time.fixed.effectBefore -0.04630831 0.4374409 -0.1058619
## t
## 0.1058618
9.2.1 A Single Random Effect
We will use the Dyestuff
data from the lme4 package, which encodes the yield, in grams, of a coloring solution (dyestuff
), produced in 6 batches using 5 different preparations.
## Batch Yield
## 1 A 1545
## 2 A 1440
## 3 A 1440
## 4 A 1520
## 5 A 1580
## 6 B 1540
And visually
The plot confirms that Yield
varies between Batch
s.
We do not want to study this batch effect, but we want our inference to apply to new, unseen, batches15.
We thus need to account for the two sources of variability when inferring on the (global) mean: the within-batch variability, and the between-batch variability
We thus fit a mixed model, with an intercept and random batch effect.
## Linear mixed model fit by REML ['lmerMod']
## Formula: Yield ~ 1 + (1 | Batch)
## Data: Dyestuff
##
## REML criterion at convergence: 319.7
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -1.4117 -0.7634 0.1418 0.7792 1.8296
##
## Random effects:
## Groups Name Variance Std.Dev.
## Batch (Intercept) 1764 42.00
## Residual 2451 49.51
## Number of obs: 30, groups: Batch, 6
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) 1527.50 19.38 78.8
Things to note:
- The syntax
Yield ~ (1|Batch)
tellslme4::lmer
to fit a model with a global intercept (1
) and a random Batch effect (1|Batch
). The|
operator is the cornerstone of random effect modeling withlme4::lmer
. 1+
isn’t really needed.lme4::lmer
, likestats::lm
adds it be default. We put it there to remind you it is implied.- As usual,
summary
is content aware and has a different behavior forlme
class objects. - The output distinguishes between random effects (\(u\)), a source of variability, and fixed effect (\(\beta\)), which we want to study. The mean of the random effect is not reported because it is unassumingly 0.
- Were we not interested in standard errors,
lm(Yield ~ Batch)
would have returned the same (fixed) effects estimates.
Some utility functions let us query the lme
object.
The function coef
will work, but will return a cumbersome output. Better use fixef
to extract the fixed effects, and ranef
to extract the random effects.
The model matrix (of the fixed effects alone), can be extracted with model.matrix
, and predictions with predict
.
9.2.2 A Full Mixed-Model
In the sleepstudy
data, we recorded the reaction times to a series of tests (Reaction
), after various subject (Subject
) underwent various amounts of sleep deprivation (Day
).
We now want to estimate the (fixed) effect of the days of sleep deprivation on response time, while allowing each subject to have his/hers own effect.
Put differently, we want to estimate a random slope for the effect of day
.
The fixed Days
effect can be thought of as the average slope over subjects.
Things to note:
~Days
specifies the fixed effect.- We used the
(Days|Subject)
syntax to telllme4::lmer
we want to fit the model~Days
within each subject. Just like when modeling withstats::lm
,(Days|Subject)
is interpreted as(1+Days|Subject)
, so we get a random intercept and slope, per subject. - Were we not interested in standard errors,
stats::lm(Reaction~Days*Subject)
would have returned (almost) the same effects. Why “almost”? See below…
The fixed day effect is:
## (Intercept) Days
## 251.40510 10.46729
The variability in the average response (intercept) and day effect is
## $Subject
## (Intercept) Days
## 308 2.258565 9.1989719
## 309 -40.398577 -8.6197032
## 310 -38.960246 -5.4488799
## 330 23.690498 -4.8143313
## 331 22.260203 -3.0698946
## 332 9.039526 -0.2721707
Did we really need the whole lme
machinery to fit a within-subject linear regression and then average over subjects?
The short answer is that if we have a enough data for fitting each subject with it’s own lm
, we don’t need lme
.
The longer answer is that the assumptions on the distribution of random effect, namely, that they are normally distributed, allow us to pool information from one subject to another.
In the words of John Tukey: “we borrow strength over subjects”.
If the normality assumption is true, this is very good news.
If, on the other hand, you have a lot of samples per subject, and you don’t need to “borrow strength” from one subject to another, you can simply fit within-subject linear models without the mixed-models machinery.
This will avoid any assumptions on the distribution of effects over subjects.
For a full discussion of the pro’s and con’s of hierarchical mixed models, consult our Bibliographic Notes.
To demonstrate the “strength borrowing”, here is a comparison of the lme, versus the effects of fitting a linear model to each subject separately.
Here is a comparison of the random-day effect from lme
versus a subject-wise linear model. They are not the same.
After fitting a LMM, we can do some model diagnostics, to verify if the assumptions hold: plot residuals vs. fitted values, and residuals vs. other features: residuals should have no trend, and equal variance. plot residuals vs. fitted values with each random effect group: residuals should be normal.
9.3 Another LMM example
In this example we analyze air-temperature prediction with LMM. Note that LMM may not be the best approach to capture continuous spatial effects (see this paper for instance), yet, it is very convenient, computationally cheep, and with proper modeling, in many cases it can be quite accurate in relation to heavy machine learning artillery.
We load a dataset that contains the minimal temperature (tmin
), some remotely sensed measures (as aqua_night_lst
), and other spatial/ spatio-temporal data, for several coordinates (locations of monitoring stations) in France, and for 50 days:
## date stn_id lat lon stn_type grd_1km_id climate_type
## 1: 1 2031001 49.83389 4.200556 4 49.837500-4.205703 2
## 2: 1 2153001 49.55333 3.498333 4 49.554167-3.500435 3
## 3: 1 2278001 49.86667 4.066667 4 49.862500-4.065677 2
## 4: 1 2298001 49.98778 3.646944 4 49.995833-3.649153 3
## 5: 1 2301002 49.52722 3.461944 4 49.529167-3.460127 3
## 6: 1 2321001 49.83639 3.879444 4 49.829167-3.882015 2
## tmin t_ordered t_orderedf reg stn_elev aqua_night_lst aqua_emis
## 1: 2.5 1 1 22 -0.4056336 -0.34267324 -0.05906734
## 2: 1.0 1 1 22 -0.6055388 0.09128599 0.24306404
## 3: 6.0 1 1 22 -0.3929814 -0.29138715 0.24306404
## 4: 0.0 1 1 22 -0.5523994 -0.20854039 -0.05906734
## 5: 0.6 1 1 22 -0.5979474 0.02816465 -0.05906734
## 6: 0.0 1 1 22 -0.3955119 -0.34267324 -0.05906734
## aqua_ndvi elev pop clc_artificial clc_water clc_bare
## 1: 0.49445289 -0.3932409 0.02314416 0.61350869 -0.2176221 -0.1343987
## 2: 0.47081709 -0.6069353 -0.13374497 0.07692565 -0.2176221 -0.1343987
## 3: 1.28726504 -0.3957257 -0.22623930 -0.73660348 -0.2176221 -0.1343987
## 4: 0.05820361 -0.5000881 -0.11149949 -0.15674761 -0.2176221 -0.1343987
## 5: 0.38978008 -0.6094201 -0.13725741 0.33656261 -0.2176221 -0.1343987
## 6: 0.69974667 -0.4926336 -0.23502041 0.02499826 -0.2176221 -0.1343987
## clc_vegetation
## 1: -0.54251044
## 2: -0.01846593
## 3: 0.77605315
## 4: 0.20974700
## 5: -0.27203585
## 6: 0.03224805
Let’s check the average tmin
per location over the time period:
library(data.table)
library(ggplot2)
db_mean <- db[, mean(tmin), by = .(lat,lon)]
ggplot(data=db_mean, aes(x=lon,y=lat,color = V1)) + geom_point() +
scale_color_gradient2(high = "red", low = "blue", mid = "white")
In such data, one may expect several sources of variation. mostly: space, time, and space-time.
We can model spatial random effects by reg
ions:
coords <- unique(db[,.(stn_id, lon, lat, reg)])
ggplot(data=coords, aes(x=lon,y=lat,color = as.factor(reg))) + geom_point()
And temporal random effects by date
:
(note the trend and seasonality in the data)
In these problems the effects might not be stationary over space and time.
In this prediction problem aqua_night_lst
is the main feature. and it is known that it’s effect change over time and space.
See for example its correlation with tmin
over time or regions:
library(magrittr)
sapply(split(db, db$date), function(d) cor(d$tmin,d$aqua_night_lst)) %>% density() %>% plot()
We now split for training and test data according to stations (note that we split randomly over stations, not space-based):
stn_intrain <- sample(unique(db$stn_id), size = length(unique(db$stn_id))*0.6)
train <- db[stn_id %in% stn_intrain,]
test <- db[!stn_id %in% stn_intrain,]
plot(unique(train[,.(lon,lat)]))
points(unique(test[,.(lon,lat)]), col = "red")
legend("topleft", pch=1, col = 1:2, legend = c("Train","Test"))
Let’s start by fitting a LMM with random intercept for time and evaluate it on the test set. Formaly:
\[ tmin_{s,t} = \beta_0 + \sum_{k=1}^p \beta_k x^k_{s,t} + (u_t) + \beta_{LST} LST_{s,t} + \epsilon_{s,t}\] where: \(s\) indicate space, \(t\) indicate time, \(\beta_k\) is the k’th fixed-effect and \(x^k_{s,t}\) is the corresponding fiture, \(u_t\) is temporal random effect, \(\beta_{LST}\) is the effect of \(LST_{s,t}\), and \(\epsilon_{s,t}\) is an independent and normally distributed error term.
tmin_lmm1 <- lmer(tmin ~
aqua_emis+aqua_ndvi+elev+pop+clc_artificial+clc_water+clc_bare+clc_vegetation+
aqua_night_lst+ reg+
(1|date), data = train)
tmin_lmm1_prd <- predict(tmin_lmm1, newdata = test)
mean((test$tmin - tmin_lmm1_prd)^2)
## [1] 5.135398
Now let’s add a random intercept for region: \[ tmin_{s,t} = \beta_0 + \sum_{k=1}^p \beta_k x^k_{s,t} + (u_t + g_r) + \beta_{LST} LST_{s,t} + \epsilon_{s,t}\] \(g_r\) is region-wise random effect (that shoud capture spatial effects).
tmin_lmm2 <- lmer(tmin ~
aqua_emis+aqua_ndvi+elev+pop+clc_artificial+clc_water+clc_bare+clc_vegetation+
aqua_night_lst+
(1|date/reg), data = train)
tmin_lmm2_prd <- predict(tmin_lmm2, newdata = test, allow.new.levels=TRUE)
mean((test$tmin - tmin_lmm2_prd)^2)
## [1] 3.444913
This is called nested random effects, as the region is nested within the date level: every reg:date
level only occur at the higher date
level.
Compare it to a fixed-effect model with fixed date:reg
interaction (note: this requires much more computational resources):
tmin_lm <- lm(tmin ~
aqua_emis+aqua_ndvi+elev+pop+clc_artificial+clc_water+clc_bare+clc_vegetation+
aqua_night_lst+
date*reg, data = train)
tmin_lm_prd <- predict(tmin_lm, newdata = test, allow.new.levels=TRUE)
mean((test$tmin - tmin_lm_prd)^2)
## [1] 3.470221
The results are a little less good.
Note: allow.new.levels=TRUE
means that new levels (or NA values) in the test set are allowed. if allow.new.levels=TRUE
(FALSE is default), then the prediction will use the unconditional (population-level) values for data with previously unobserved levels (or NAs). Why there are new levels at all? because there are date-reg interaction levels in the train that are not in the test.
Note: the (1|date/reg)
specify the hirarchy (the order is important) of the random effects and is equivalent to (1|date) + (1|date:reg)
(where :
denotes an interaction).
Adding also random LST slope for region-time interaction: \[ tmin_{s,t} = \beta_0 + \sum_{k=1}^p \beta_k x^k_{s,t} + (u_t + g_r) + (\beta_{LST} + v_t + h_{r,t}) LST_{s,t} + \epsilon_{s,t}\] where \(v_t\) is random LST slope, and \(h_{r,t}\) is random daily-region slope.
tmin_lmm3 <- lmer(tmin ~
aqua_emis+aqua_ndvi+elev+pop+clc_artificial+clc_water+clc_bare+clc_vegetation+
aqua_night_lst+
(1+aqua_night_lst|date/reg), data = train)
tmin_lmm3_prd <- predict(tmin_lmm3, newdata = test, allow.new.levels=TRUE)
mean((test$tmin - tmin_lmm3_prd)^2)
## [1] 2.951929
Can you think on more hierarchical levels of random effects?
Let’s see the estimated random slope over time and space:
reg_date_randef <- lme4::ranef(tmin_lmm3)$`reg:date`
library(stringr)
reg_date_randef <- cbind(str_split_fixed(rownames(reg_date_randef), ":", 2),
reg_date_randef)
colnames(reg_date_randef) <- c("reg", "date", "intc", "slope")
head(reg_date_randef)
## reg date intc slope
## 11:1 11 1 -0.6925359 -0.19053620
## 11:2 11 2 2.2432866 -0.94072561
## 11:3 11 3 0.2730236 -0.28907070
## 11:5 11 5 -2.0661077 0.11976649
## 11:6 11 6 -1.6479033 0.63986723
## 11:7 11 7 -1.0950623 0.07895034
test <- merge(x=test, y=reg_date_randef, by=c("reg","date"), all.x = T)
ggplot(data = test[date %in% 47:50],
aes(x=lon,y=lat, color = slope)) + geom_point() + facet_wrap(~date)
Let’s see the estimated region-wise random intercept:
It seems that the regions’ random effects are differently distributed in different regions.
9.3.1 lmer
formula
What would V1 ~ (1+V3*V4|V2) + V3*V4
estimate?
- P1: A global intercept
- P2: A single global estimate for the effect of V3
- P3: A single global estimate for the effect of V4
- P4: A single global estimate for the interaction between V3 and V4
- P5: Deviations of the intercept from P1 in each level of V2
- P6: Deviations of the V3 effect from P2 in each level of V2
- P7: Deviations of the V4 effect from P3 in each level of V2
- P8: Deviations of the V3-by-V4 interaction from P4 in each level of V2
- P9 Correlation between P5 and P6 across levels of V2
- P10 Correlation between P5 and P7 across levels of V2
- P11 Correlation between P5 and P8 across levels of V2
- P12 Correlation between P6 and P7 across levels of V2
- P13 Correlation between P6 and P8 across levels of V2
- P14 Correlation between P7 and P8 across levels of V2
9.3.2 Sparsity and Memory Efficiency
In Sparse Representations Chapter at R(BGU) course we discuss how to efficiently represent matrices in memory. At this point we can already hint that the covariance matrices implied by LMMs are sparse. This fact is exploited in the lme4 package, making it very efficient computationally.
9.4 Serial Correlations in Space/Time
As previously stated, a hierarchical model of the type \(y=x'\beta+z'u+\epsilon\) is a very convenient way to state the correlations of \(y|x\) instead of specifying the matrix \(Var[z'u+\epsilon|x]\) for various \(x\) and \(z\). The hierarchical sampling scheme implies correlations in blocks. What if correlations do not have a block structure? Temporal data or spatial data, for instance, tend to present correlations that decay smoothly in time/space. These correlations cannot be represented via a hierarchical sampling scheme.
One way to go about, is to find a dedicated package for space/time data. For instance, in the Spatio-Temporal Data task view, or the Ecological and Environmental task view.
Instead, we will show how to solve this matter using the nlme package. This is because nlme allows to compound the blocks of covariance of LMMs, with the smoothly decaying covariances of space/time models.
We now use an example from the help of nlme::corAR1
.
The nlme::Ovary
data is panel data of number of ovarian follicles in different mares (female horse), at various times.
We fit a model with a random Mare
effect, and correlations that decay geometrically in time.
In the time-series literature, this is known as an auto-regression of order 1 model, or AR(1), in short.
## Grouped Data: follicles ~ Time | Mare
## Mare Time follicles
## 1 1 -0.13636360 20
## 2 1 -0.09090910 15
## 3 1 -0.04545455 19
## 4 1 0.00000000 16
## 5 1 0.04545455 13
## 6 1 0.09090910 10
fm1Ovar.lme <- nlme::lme(fixed=follicles ~ sin(2*pi*Time) + cos(2*pi*Time),
data = Ovary,
random = pdDiag(~sin(2*pi*Time)),
correlation=corAR1() )
summary(fm1Ovar.lme)
## Linear mixed-effects model fit by REML
## Data: Ovary
## AIC BIC logLik
## 1563.448 1589.49 -774.724
##
## Random effects:
## Formula: ~sin(2 * pi * Time) | Mare
## Structure: Diagonal
## (Intercept) sin(2 * pi * Time) Residual
## StdDev: 2.858385 1.257977 3.507053
##
## Correlation Structure: AR(1)
## Formula: ~1 | Mare
## Parameter estimate(s):
## Phi
## 0.5721866
## Fixed effects: follicles ~ sin(2 * pi * Time) + cos(2 * pi * Time)
## Value Std.Error DF t-value p-value
## (Intercept) 12.188089 0.9436602 295 12.915760 0.0000
## sin(2 * pi * Time) -2.985297 0.6055968 295 -4.929513 0.0000
## cos(2 * pi * Time) -0.877762 0.4777821 295 -1.837159 0.0672
## Correlation:
## (Intr) s(*p*T
## sin(2 * pi * Time) 0.000
## cos(2 * pi * Time) -0.123 0.000
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -2.34910093 -0.58969626 -0.04577893 0.52931186 3.37167486
##
## Number of Observations: 308
## Number of Groups: 11
Things to note:
- The fitting is done with the
nlme::lme
function, and notlme4::lmer
. sin(2*pi*Time) + cos(2*pi*Time)
is a fixed effect that captures seasonality.- The temporal covariance, is specified using the
correlations=
argument. - AR(1) was assumed by calling
correlation=corAR1()
. Seenlme::corClasses
for a list of supported correlation structures. - From the summary, we see that a
Mare
random effect has also been added. Where is it specified? It is implied by therandom=
argument. Read?lme
for further details.
We can now inspect the contrivance implied by our model’s specification.
As expected, we see the blocks of non-null covariance within Mare
, but unlike “vanilla” LMMs, the covariance within mare is not fixed. Rather, it decays geometrically with time.
9.5 Extensions
9.5.1 Cluster Robust Standard Errors
As previously stated, random effects are nothing more than a convenient way to specify covariances within a level of a random effect, i.e., within a group/cluster. This is also the motivation underlying cluster robust inference, which is immensely popular with econometricians, but less so elsewhere. With cluster robust inference, we assume a model of type \(y=f(x)+\varepsilon\); unlike LMMs we assume independence (conditional on \(x\)), but we allow \(\varepsilon\) within clusters defined by \(x\). For a longer comparison between the two approaches, see Michael Clarck’s guide.
9.5.2 Linear Models for Panel Data
nlme and lme4 will probably provide you with all the functionality you need for panel data. If, however, you are trained as an econometrician, and prefer the econometric parlance, then the plm and panelr packages for panel linear models, are just for you. In particular, they allow for cluster-robust covariance estimates, and Durbin–Wu–Hausman test for random effects. The plm package vignette also has an interesting comparison to the nlme package.
9.5.3 Testing Hypotheses on Correlations
After working so hard to model the correlations in observation, we may want to test if it was all required. Douglas Bates, the author of nlme and lme4 wrote a famous cautionary note, found here, on hypothesis testing in mixed models, in particular hypotheses on variance components. Many practitioners, however, did not adopt Doug’s view. Many of the popular tests, particularly the ones in the econometric literature, can be found in the plm package (see Section 6 in the package vignette). These include tests for poolability, Hausman test, tests for serial correlations, tests for cross-sectional dependence, and unit root tests.
9.6 Bibliographic Notes
Most of the examples in this chapter are from the documentation of the lme4 package (Bates et al. 2015). For a general and very applied treatment, see Pinero and Bates (2000). As usual, a hands on view can be found in Venables and Ripley (2013), and also in an excellent blog post by Kristoffer Magnusson For a more theoretical view see Weiss (2005) or Searle, Casella, and McCulloch (2009). Sometimes it is unclear if an effect is random or fixed; on the difference between the two types of inference see the classics: (???), (???), and the more recent Rosset and Tibshirani (2018). For an interactive, beautiful visualization of the shrinkage introduced by mixed models, see Michael Clark’s blog. For more on predictions in linear mixed models see Robinson (1991), Rabinowicz and Rosset (2018), and references therein. See Michael Clarck’s guide for various ways of dealing with correlations within groups. For the geo-spatial view and terminology of correlated data, see Christakos (2000), Diggle, Tawn, and Moyeed (1998), Allard (2013), and Cressie and Wikle (2015).
9.7 Practice Yourself
Computing the variance of the sample mean given dependent correlations. How does it depend on the covariance between observations? When is the sample most informative on the population mean?
Think: when is a paired t-test not equivalent to an LMM with two measurements per group?
Return to the
Penicillin
data set. Instead of fitting an LME model, fit an LM model withlm
. I.e., treat all random effects as fixed.- Compare the effect estimates.
- Compare the standard errors.
- Compare the predictions of the two models.
[Very Advanced!] Return to the
Penicillin
data and use thegls
function to fit a generalized linear model, equivalent to the LME model in our text.Read about the “oats” dataset using
? MASS::oats
.Inspect the dependency of the yield (Y) in the Varieties (V) and the Nitrogen treatment (N).- Fit a linear model, does the effect of the treatment significant? The interaction between the Varieties and Nitrogen is significant?
- An expert told you that could be a variance between the different blocks (B) which can bias the analysis. fit a LMM for the data.
- Do you think the blocks should be taken into account as “random effect” or “fixed effect”?
Return to the temporal correlation in Section 9.4, and replace the AR(1) covariance, with an ARMA covariance. Visualize the data’s covariance matrix, and compare the fitted values.
See DataCamps’ Hierarchical and Mixed Effects Models for more self practice.
References
Allard, Denis. 2013. “J.-P. Chilès, P. Delfiner: Geostatistics: Modeling Spatial Uncertainty.” Springer.
Bates, Douglas, Martin Mächler, Ben Bolker, and Steve Walker. 2015. “Fitting Linear Mixed-Effects Models Using lme4.” Journal of Statistical Software 67 (1): 1–48. https://doi.org/10.18637/jss.v067.i01.
Christakos, George. 2000. Modern Spatiotemporal Geostatistics. Vol. 6. Oxford University Press.
Cressie, Noel, and Christopher K Wikle. 2015. Statistics for Spatio-Temporal Data. John Wiley & Sons.
Diggle, Peter J, JA Tawn, and RA Moyeed. 1998. “Model-Based Geostatistics.” Journal of the Royal Statistical Society: Series C (Applied Statistics) 47 (3): 299–350.
Pinero, Jose, and Douglas Bates. 2000. “Mixed-Effects Models in S and S-Plus (Statistics and Computing).” Springer, New York.
Rabinowicz, Assaf, and Saharon Rosset. 2018. “Assessing Prediction Error at Interpolation and Extrapolation Points.” arXiv Preprint arXiv:1802.00996.
Robinson, George K. 1991. “That Blup Is a Good Thing: The Estimation of Random Effects.” Statistical Science, 15–32.
Rosset, Saharon, and Ryan J Tibshirani. 2018. “From Fixed-X to Random-X Regression: Bias-Variance Decompositions, Covariance Penalties, and Prediction Error Estimation.” Journal of the American Statistical Association, nos. just-accepted.
Searle, Shayle R, George Casella, and Charles E McCulloch. 2009. Variance Components. Vol. 391. John Wiley & Sons.
Venables, William N, and Brian D Ripley. 2013. Modern Applied Statistics with S-Plus. Springer Science & Business Media.
Weiss, Robert E. 2005. Modeling Longitudinal Data. Springer Science & Business Media.