Chapter 28 A Generalized Linear Model for Poison Response Data
For all i=1,…,n, yi∼Possion(λi), log(λi)=x′iβ. The Poisson log likelihood is ℓ(β∣y)=n∑i=1[yix′iβ−exp(x′iβ)−log(yi!)] The ℓ(β∣y) can be maximized using Fisher’s scoring method to obtain the MLE.
Let λ=exp(x′β) and ˜λ=exp(˜x′β) where ˜x=[x1,…,xj−1,xj+1,xj+1,…,xp]′, we have ˜λ/λ=exp(βj). This means that all other explanatory variables held constant, the mean response at xj+1 is exp(βj) times the mean response at xj.
= glm(y ~ x, family = poisson(link = "log"))
o summary(o)
# likelihood ratio test
anova(o, test = "Chisq")
Lack of Fit: Under saturated model, λi=yi. Then the likelihood ratio statistic for testing the Poisson GLM as the reduced model vs. the saturated model as the full model is 2n∑i=1[yilog(yiˆλi)−(yi−ˆλi)] which is the Deviance Statistic for the Poisson case.
The deviance residuals are given by di≡sign(yi−ˆλi)√2[yilog(yiˆλi)−(yi−ˆλi)] The Pearson’s Chi-square statistic is X2=∑ni=1(yi−ˆE(yi)√^Var(yi))2=∑ni=1(yi−ˆλi√ˆλi)2. The Pearson residure ri=(yi−ˆλi)/√ˆλi.
= resid(o, type = "deviance")
d = resid(o, type = "pearson") r
For the Poisson case, Var(y)=E(y)=λ is a function of E(y). If either the Deviance Statistic or the Pearson Chi-Square Statistic suggests a lack of fit that cannot be explained by other reasons (e.g., poor model for the mean or a few extreme outliers), overdispersion may be the problem.
Quasi-likelihood: Suppose Var(yi)=ϕλi for some unknown dispersion parameter ϕ>1. ϕ can be estimated by ˆϕ=∑ni=1d2i/(n−p) or ˆϕ=∑ni=1r2i/(n−p).
- The estimated variance of ˆβ is multiplied by ˆϕ.
- For Wald type inferences, the standard normal null distribution is replaced by t with n−p degrees of freedom.
- Any test statistic T that was assumed χ2q under H0 is replaced with T/(qˆϕ) and compared to an F distribution with q and n−p degrees of freedom.
# estimates of the dispersion parameter
deviance(o)/df.residual(o)
sum(r^2)/df.residual(o)