G The Generalized Linear Model

G.1 The General Formulations

Here we provide a brief review of the linear regression model: \[ Y_i = \boldsymbol{x}_i^{\top} \boldsymbol{\beta} + \epsilon_i \, \mbox{ with } \, \epsilon_i \sim \mathop{\mathrm{N}}(0, \sigma^2) \] with explanatory variables \(\boldsymbol{x}_i\) and regression coefficients \(\boldsymbol{ \beta}\) is rewritten as \[Y_i \sim \mathop{\mathrm{N}}(\mu_i=\boldsymbol{ x}_i^{\top} \boldsymbol{ \beta}, \sigma^2)\] with mean \(\mu_i\) and variance \(\sigma^2\).

This can be generalized in two ways: firstly we can consider other distributions than normal distribution for the response variable \(Y_i\) (for example, Bernoulli, Binomial, Poisson, \(\ldots\)). Secondly, the mean \(\mathop{\mathrm{\mathsf{E}}}(Y_i)=\mu_i\) will be linked to the linear predictor \(\eta_i = \boldsymbol{ x}_i^{\top} \boldsymbol{ \beta}\) using a (monotone) link function \(g = h^{-1}\): \[ g(\mu_i) = \eta_i \, \mbox{ so } \, \mu_i = h(\eta_i). \] Notice how this is different from general linear models (Appendix F.2): general linear models assume a normal distribution of residuals, while generalized linear models relax this assumption and allow for other distributions from the exponential family. A general linear model can be viewed as a generalized linear model with normal errors and identity link.

G.1.1 Logistic regression

Instead of the normality assumption on \(Y_i\) in linear models, other distributions of \(Y_i\) are allowed for generalization.

For logistic regression, consider \[\begin{align*} Y_i &\sim \mathop{\mathrm{Bin}}(1, \mu_i=\pi_i)\\ \mathop{\mathrm{logit}}(\pi_i) &= \boldsymbol{x}_i^{\top} \boldsymbol{\beta}\\ \pi_i &= \text{expit}(\boldsymbol{ x}_i^{\top} \boldsymbol{ \beta}). \end{align*}\] So here the link function is \[ g(\pi_i) = \mathop{\mathrm{logit}}(\pi_i) = \log \frac{\pi_i}{1-\pi_i}. \]

Except for the intercept, we could interpret \(\exp(\beta_i)\) as odds ratio in this model. Also notice that the variance is a function of the mean: \(\mathop{\mathrm{Var}}(Y_i)=\pi_i(1-\pi_i)\).

If covariates \(x_i\) are identical for some observations \(Y_i\), then aggregation to the binomial form \(Y_j = \sum_{i} Y_i \sim \mathop{\mathrm{Bin}}(n_i, \pi_i)\) is possible.

An example of a logistic regression can be found in Chapter G.3.1.

G.1.2 Log-linear Poisson regression

In addition to binary data, we may consider count data, for which the random variable \(Y_i\) follows a Poisson distribution.

Poisson regression assumes \[\begin{align*} Y_i &\sim \mathop{\mathrm{Po}}(\mu_i)\\ \log(\mu_i) &= \boldsymbol{x}_i^{\top} \boldsymbol{\beta} ,\\ \mu_i &= \exp( \boldsymbol{x}_i^{\top} \boldsymbol{\beta} ),\\ \end{align*}\] so the link function in this case is \[g(\mu_i) = \log(\mu_i).\] As a consequence, \(\exp(\beta_k)\) can be interpreted (for all \(k \ge 1\)) as a multiplicative rate ratio for the expected count. It is also important to note that the variance increases with the mean in this model: \(\mathop{\mathrm{Var}}(Y_i)=\mathop{\mathrm{\mathsf{E}}}[Y_i]=\mu_i\).

If several observations share identical covariate values \(\boldsymbol{x}_i\), then they can be combined into a single grouped Poisson response with outcome \(Y_j = \sum_i Y_i\) and corresponding mean \(\mu_j = \sum_i \mu_i\).

G.2 GEE’s

This marginal approach is a multivariate quasi-likelihood approach. Quasi-likelihood is used to deal with overdispersion, which means that the variance of the outcome is greater than what is assumed by the model.

Consider the weighted least-squares estimate \(\hat{\boldsymbol{\beta}}_{\boldsymbol{{W}}}\) in the general linear model with block-diagonal working correlation matrix \(\boldsymbol{W}^{-1}\) with entries \(\boldsymbol{W}_0^{-1}\) \[\begin{align*} \hat{\boldsymbol{\beta}}_{\boldsymbol{{W}}} & = (\boldsymbol{X}^{\top}\boldsymbol{W}\boldsymbol{X})^{-1}\boldsymbol{X}^{\top}\boldsymbol{W}\boldsymbol{Y} \\ & = \left( \sum_{i=1}^m \boldsymbol{X}_i^{\top}\boldsymbol{W}_0 \boldsymbol{X}_i \right)^{-1} \sum_{i=1}^m \boldsymbol{X}_i^{\top}\boldsymbol{W}_0 \boldsymbol{Y}_i. \end{align*}\] \(\hat{\boldsymbol{\beta}}_{\boldsymbol{W}}\) can be seen as solution of the multivariate score equation \[\begin{equation} \boldsymbol{S}(\boldsymbol{\beta}) = \sum_{i=1}^m \left(\frac{\partial \boldsymbol{\mu}_i}{\partial \boldsymbol{\beta}}\right) \boldsymbol{W}_0 (\boldsymbol{Y}_i - \boldsymbol{\mu}_i)=\boldsymbol{0}, \tag{G.1} \end{equation}\] where \(\boldsymbol{\mu}_i=\mathop{\mathrm{\mathsf{E}}}(\boldsymbol{Y}_i)=\boldsymbol{X}_i \boldsymbol{\beta}\) and therefore \(\left(\frac{\partial \boldsymbol{\mu}_i}{\partial\boldsymbol{\beta}}\right)=\boldsymbol{X}_i^{\top}\).

The score function (G.1) differs from the score equation in the ordinary generalized linear model \[ {S}(\boldsymbol{\beta}) = \sum_{i=1}^m \left(\frac{\partial {\mu}_i}{\partial \boldsymbol{\beta}}\right) \mathop{\mathrm{Var}}(Y_i)^{-1} (Y_i - \mu_i) = {0} \] by replacing \(\mathop{\mathrm{Var}}(Y_i)^{-1}\) with a weight \(\boldsymbol{W}_0\).

The basic idea is to use multivariate quasi-likelihood approach for non-normal data to take into account correlation between components of \(Y_i\).

The generalised estimating equation \[ \boldsymbol{S}(\boldsymbol{\beta}) = \sum_{i=1}^m \left(\frac{\partial \boldsymbol{\mu}_i}{\partial \boldsymbol{\beta}}\right) \mathop{\mathrm{Cov}}(\boldsymbol{Y}_i)^{-1} (\boldsymbol{Y}_i - \boldsymbol{\mu}_i)=\boldsymbol{0} \] contains the multivariate \(\mathop{\mathrm{Cov}}(\boldsymbol{Y}_i)^{-1}\) to allow for dependence. \(\mathop{\mathrm{Cov}}(\boldsymbol{Y}_i)^{-1}\) can be replaced with some working correlation matrix \(\boldsymbol{W}^{-1}\). This approach is called quasi-likelihood because typically no true likelihood function exists with \(\boldsymbol{S}(\boldsymbol{\beta})\) as score equation.

It is possible to obtain robust standard errors using the sandwich formula. The estimates \(\hat{\boldsymbol{\beta}}\) (and \(\hat{\boldsymbol{ \alpha}}\)) are asymptotically normal.

\(\hat{\boldsymbol{\beta}}\) is consistent (for \(m \rightarrow \infty\)), even if the correlation structure is not correct, which means \(\mathop{\mathrm{Corr}}(\boldsymbol{Y}_i) \neq \boldsymbol{W}^{-1}\), while the estimates of \(\boldsymbol{\beta}\) are efficient if \(\mathop{\mathrm{Corr}}(\boldsymbol{Y}_i) \approx \boldsymbol{W}^{-1}\).

Bounds of correlation between binary variables

Here we discuss the bounds of correlation between binary variable. Consider two binary variables \(Y_1\) and \(Y_2\) with success probabilities \(\pi_1\) and \(\pi_2\), say. One can show \[\max(0,\pi_1+\pi_2-1) \leq \operatorname{\mathsf{Pr}}(Y_1=1,Y_2=1) \leq \min(\pi_1,\pi_2).\] This implies certain constraints on the correlation \[\begin{align*} \rho(Y_1,Y_2) &= \frac{\mathop{\mathrm{\mathsf{E}}}(Y_1 Y_2) - \mathop{\mathrm{\mathsf{E}}}(Y_1) \mathop{\mathrm{\mathsf{E}}}(Y_2)}{\{\mathop{\mathrm{Var}}(Y_1)\mathop{\mathrm{Var}}(Y_2)\}^{1/2}} \\ &=\frac{\operatorname{\mathsf{Pr}}(Y_1=1,Y_2=1) - \pi_1 \pi_2}{\{\pi_1(1-\pi_1)\pi_2(1-\pi_2)\}^{1/2}}. \end{align*}\]

For example, suppose \(\pi_1=0.5\) and \(\pi_2=0.9\), then the correlation between \(Y_1\) and \(Y_2\) cannot be larger than 1/3 in absolute value:

p1 <- 0.5
p2 <- 0.9
(lim.joint <- c(max(c(0, p1+p2-1)), min(c(p1, p2))))
## [1] 0.4 0.5
(lim.corr <- (lim.joint - p1*p2) / sqrt(p1*(1-p1)*p2*(1-p2)))
## [1] -0.3333333  0.3333333

Therefore, correlation as a measure of association may not be optimal for binary variables. Alternatively, we can use the marginal odds ratio \[\mbox{OR}(Y_1,Y_2) = \frac{\operatorname{\mathsf{Pr}}(Y_1=1,Y_2=1)\operatorname{\mathsf{Pr}}(Y_1=0,Y_2=0)}{\operatorname{\mathsf{Pr}}(Y_1=1,Y_2=0)\operatorname{\mathsf{Pr}}(Y_1=0,Y_2=1)}.\]

G.3 GLM’s with Random Effects

Condition on random effects \(\boldsymbol{U}_i \sim \mathop{\mathrm{N}}_q(\boldsymbol{0}, \boldsymbol{G})\) , \(Y_{ij}\) follows a generalized linear model with mean \[\mu_{ij} = \mathop{\mathrm{\mathsf{E}}}(Y_{ij}) = h(\eta_{ij})\] and linear predictor \[ \eta_{ij}=\boldsymbol{x}_{ij}^T\boldsymbol{\beta}^* + \boldsymbol{d}_{ij}^T\boldsymbol{U}_i,\] where \(\boldsymbol{\beta}^*\) are the conditional coefficients. In this conditional model, parameters estimation is based on the marginal distribution of \(\boldsymbol{Y}_i\), which can be obtained through numerical integration.

G.3.1 Logistic regression model

Consider a logistic regression model with random intercept \[\mathop{\mathrm{logit}}\operatorname{\mathsf{Pr}}(Y_{ij}=1\,\vert\,U_i) = \boldsymbol{x}_{ij}^T\boldsymbol{\beta}^* + U_i, \quad U_i \sim \mathop{\mathrm{N}}(0, \nu^2).\] In a marginal model, we directly specify \[\mathop{\mathrm{logit}}\operatorname{\mathsf{Pr}}(Y_{ij}=1) = \boldsymbol{x}_{ij}^T\boldsymbol{\beta}.\] It can be shown that \[\begin{equation} \boldsymbol{\beta} \approx \underbrace{(c^2 \nu^2 + 1)^{-1/2}}_{q(\nu^2)} \boldsymbol{\beta}^*, \mbox{ where } c \approx 10/17. \tag{G.2} \end{equation}\] Therefore, marginal coefficients \(\boldsymbol{\beta}\) and conditional coefficients \(\boldsymbol{\beta}^*\) have the same sign with \(|\boldsymbol{\beta}| \leq |\boldsymbol{\beta}^*|\). Moreover, their ratio \(q(\nu^2)\) decreases with increasing variance \(\nu^2\), as shown in Table G.1.

Table G.1: Values of \(q(\nu^2)\) for different \(\nu^2\).
\[\nu^2\] 0.00 0.10 0.50 1.00 2.00 5.00
\[q(\nu^2)\] 1.23 1.45 2.01 2.34 2.89 3.12