4 Modified Box Correction
4.1 nlme
The code below gives an R function that returns the modified Box correction and the corresponding p-value. The function uses mixed model objects from the nlme package to define the mean model, and the structure of the covariance matrix that will be used. The preferred choice is to use an unstructured estimate, but when this is not possible we can compute an estimate of \(\Sigma\) with the most complex structure the data will support.
The function requires two nlme models as input, the model_r
variable should be the formula with the variables removed that you want to test significance for in the full model defined by the model
variable.
The default estimate of \(\Sigma\) is the REML estimate extracted from the model object defined by the model
input variable. However, if you do want to use an alternative covariance estimate you can select it using the Sigma
variable. For example in Section 4.2.3 we use the sample covariance matrix.
library(nlme)
Box_correction_modified_nlme = function(model, model_r, data, Sigma = NULL){
rownames(data) = c()
# Design matrix for full model
X <- model.matrix(model,data = data)
# Gives the rownumbers of the dataset that have missing values
ind = which(is.na(match(rownames(data),rownames(X))))
Y = getResponse(model)
# Design matrix for model with removed variables
X_r <- model.matrix(model_r,data = data)
# Number of observations
n = nrow(X)
# Number of variables
parm = ncol(X)
# Number of variables being removed
c = ncol(X)-ncol(X_r)
if(is.null(Sigma)){
# Put in if statement to deal with models with no correlated errors
if(is.null(model$modelStruct$corStruct)){
Sigma = diag((model$sigma)^2, nrow = n, ncol = n)
} else {
# repeats = as.numeric(levels(model$groups))
# Creating block matrix for model with correlated errors
# type = "marginal" extracts the covariance matrix when
# not conditioning on the random effects. Therefore extracts
# the full covariance matrix G side and R side.
Sigma = getVarCov(model, type = "marginal")
Sigma = diag(nlevels(model$groups)) %x% Sigma
if(!(length(ind) == 0)){
Sigma = Sigma[-ind,-ind]
}
}
}
A = diag(n) - ( X %*% solve( t(X) %*% X ) %*% t(X) )
B = ( X %*% solve( t(X) %*% X ) %*% t(X) ) - ( X_r %*% solve( t(X_r) %*% X_r ) %*% t(X_r) )
trace = function(x){
sum(diag(x))
}
V = ( (trace((B %*% Sigma) %*% (B %*% Sigma)) / trace(B %*% Sigma)^2)
+ (trace((A %*% Sigma) %*% (A %*% Sigma)) / trace(A %*% Sigma)^2) )
v_2 = (c*(4*V + 1) - 2)/((c*V) - 1)
lambda = ((n - parm)/c) * ((v_2 - 2)/v_2) * ((trace(B %*% Sigma) / trace(A %*% Sigma)))
F_ = ((n - parm)/c) * ((t(Y) %*% B %*% Y)/(t(Y) %*% A %*% Y))
F_mod = F_/lambda
p_value = pf(df1 = c, df2 = v_2, q = F_mod, lower.tail = F)
values = c(lambda, F_, c, v_2, F_mod, p_value)
names(values) = c("Lambda", "F_", "v1_MOD", "v2_mod","F_MOD", "prob_F_mod")
values = round(values, digits = 3)
fixed_effects_estimate = solve(t(X)%*%X)%*%t(X)%*%Y
return(list("Lambda" = lambda, "F_" = F_[1], "c" = c, "v2_mod" = v_2, "F_MOD" = F_mod[1],
"prob_F_mod" = p_value[1], "A" = A, "B" = B, "n" = n, "r" = parm, "Y" = Y,
"fixed_effects_estimate" = fixed_effects_estimate))
}
In your R session copy and run this code so that it is available in your R environment. Next we give some examples of applying the function to recreate results presented in the paper II.
4.2 Examples
4.2.1 Cardiac enzyme
In this example we look at the same dataset and mean model that was used in the example covered in Section 6.1 of paper II. We apply the Box_correction_modified_nlme()
function to test for the inclusion of the interaction term between treatment and time in the model.
We apply the function with the original dataset and then to the dataset with artificial dropouts, to recreate the results of Table VII.
First of all we need to import the dataset. The code below imports the dataset from a GitHub repository, and converts a selection of the variables in the dataset to factors.
Cardiac_enzyme_data = read.csv("https://raw.githubusercontent.com/DylanDijk/Simon-Skene-Mixed-Models-Tutorial/main/Data_Images_Figures/The%20Cardiac%20Enzyme%20Data%20-%20Reduced%20Data%20set.csv")
Cardiac_enzyme_data$dog = as.factor(Cardiac_enzyme_data$dog)
Cardiac_enzyme_data$trt = as.factor(Cardiac_enzyme_data$trt)
Cardiac_enzyme_data$time = as.factor(Cardiac_enzyme_data$time)
The code below creates the nlme models, a good resource documenting how to fit different mixed models with nlme is Fitting linear mixed models in R by Brice Ozenne.
I have also set glsControl(tolerance = 10^-8)
, this is too match the convergence criteria used by nlme to the default convergence criteria used by PROC MIXED in SAS.
library(nlme)
glsControl(tolerance = 10^-8)
model = gls(model = atp ~ trt + time + trt:time,
data = Cardiac_enzyme_data,
correlation = corSymm(form =~ as.numeric(time)|dog),
weights = varIdent(form =~ 1|time))
model_r = gls(model = atp ~ trt + time,
data = Cardiac_enzyme_data,
correlation = corSymm(form =~ as.numeric(time)|dog),
weights = varIdent(form =~ 1|time))
We can compare the results from the Box_correction_modified_nlme()
function with the results given in the upper panel of Table VII in paper II.
cardiac_mod_box =
Box_correction_modified_nlme(data = Cardiac_enzyme_data, model = model, model_r = model_r)
$F_MOD
cardiac_mod_box1] 2.520353
[$prob_F_mod
cardiac_mod_box1] 0.07743055 [
4.2.2 Cardiac enzyme - Artifical missing data
The results in the lower panel of Table VII are computed by applying the same method to the same dataset but now with artificial missing data introduced. The Box_correction_modified_nlme()
function works with missing data, and we run it with the modified dataset in the code below.
We first introduce the missing data points to the Cardiac_enzyme_data
dataset:
Cardiac_enzyme_data_miss = Cardiac_enzyme_data
Cardiac_enzyme_data_miss$atp[which(Cardiac_enzyme_data_miss$dog == 4)][7:9] = NA
The code below creates the models with the modified dataset, it is identical to the model construction in the previous example but with the new dataset.
library(nlme)
glsControl(tolerance = 10^-8)
model = gls(model = atp ~ trt + time + trt:time,
data = Cardiac_enzyme_data_miss,
correlation = corSymm(form =~ as.numeric(time)|dog),
weights = varIdent(form =~ 1|time), na.action = na.omit)
model_r = gls(model = atp ~ trt + time,
data = Cardiac_enzyme_data_miss,
correlation = corSymm(form =~ as.numeric(time)|dog),
weights = varIdent(form =~ 1|time), na.action = na.omit)
We get the same resuls as the final row of Table VII in paper II:
cardiac_miss_mod_box =
Box_correction_modified_nlme(model = model, model_r = model_r, data = Cardiac_enzyme_data_miss)
$F_MOD
cardiac_miss_mod_box1] 2.84006
[$prob_F_mod
cardiac_miss_mod_box1] 0.05912901 [
4.2.3 Guinea pigs
In this example we use a dataset that shows the amplitude of action potential that was measured from pieces of heart dissected from guinea pigs. The measurements were taken after being exposed to two different compounds, therefore we have two response variables AP1 and AP2.
Each piece of tissue had a control measure where no compound was given and then six increasing concentrations of the compound. This was carried out on three guinea pigs’ hearts, so we have seven repeated measurements from just three participants.
The code below imports the dataset and converts the concentration variable (conc.f
) and subject variable (gpig.f
) to factors:
Guinea_pigs_data = read.csv("https://raw.githubusercontent.com/DylanDijk/Simon-Skene-Mixed-Models-Tutorial/main/Data_Images_Figures/brammer.csv")
Guinea_pigs_data$conc.f = factor(Guinea_pigs_data$conc.f)
Guinea_pigs_data$gpig.f = factor(Guinea_pigs_data$gpig.f)
In this example the experiments are treated as simple repeated measures designs with concentration as the time variable and tissue as the subject.
Due to the small number of samples, the estimation method for the unstructured covariance models did not converge. Therefore, for this example Skene and Kenward used the sample covariance matrix as an estimator of the covariance matrix (\(\Sigma\)).
The code below computes the sample covariance estimates for the seven repeated measurements, which are then used as inputs for Sigma
in the Box_correction_modified_nlme()
function.
sigma_compound_1 = cov(rbind(Guinea_pigs_data[1:7,"AP1"],Guinea_pigs_data[8:14,"AP1"],Guinea_pigs_data[15:21,"AP1"]))
sigma_compound_2 = cov(rbind(Guinea_pigs_data[1:7,"AP2"],Guinea_pigs_data[8:14,"AP2"],Guinea_pigs_data[15:21,"AP2"]))
Next we create the blocked version of these covariance matrices, a block for each of the subjects. To create this covariance matrix we need to know the number of unique subjects in the dataset.
number_of_subjects = nlevels(Guinea_pigs_data$gpig.f) #number of subjects
sigma_compound_1_block = diag(number_of_subjects) %x% sigma_compound_1
sigma_compound_2_block = diag(number_of_subjects) %x% sigma_compound_2
Next we construct the models, we make a model for each of the response variables with concentration (conc.f
) as the only covariate.
library(nlme)
model_AP1 = gls(AP1~conc.f,data=Guinea_pigs_data)
model_AP1_r = gls(AP1~1,data=Guinea_pigs_data)
model_AP2 = gls(AP2~conc.f,data=Guinea_pigs_data)
model_AP2_r = gls(AP2~1,data=Guinea_pigs_data)
Below we run the Box_correction_modified_nlme()
function, using the model objects we have created above as inputs.
- For compound 1.
Box_mod_1 =
Box_correction_modified_nlme(model = model_AP1, model_r = model_AP1_r, data = Guinea_pigs_data, Sigma = sigma_compound_1_block)
This gives the following output:
$F_MOD
Box_mod_11] 4.497613
[$prob_F_mod
Box_mod_11] 0.05698665 [
- For compound 2.
Box_mod_2 = Box_correction_modified_nlme(model = model_AP2, model_r = model_AP2_r, data = Guinea_pigs_data, Sigma = sigma_compound_2_block)
For compound 2, we get these values:
$F_MOD
Box_mod_21] 20.84697
[$prob_F_mod
Box_mod_21] 0.001995718 [