## 7.4 Poisson Regression

From the Poisson distribution

\begin{aligned} f(Y_i) &= \frac{\mu_i^{Y_i}exp(-\mu_i)}{Y_i!}, Y_i = 0,1,.. \\ E(Y_i) &= \mu_i \\ var(Y_i) &= \mu_i \end{aligned}

which is a natural distribution for counts. We can see that the variance is a function of the mean. If we let $$\mu_i = f(\mathbf{x_i; \theta})$$, it would be similar to Logistic Regression since we can choose $$f()$$ as $$\mu_i = \mathbf{x_i'\theta}, \mu_i = \exp(\mathbf{x_i'\theta}), \mu_i = \log(\mathbf{x_i'\theta})$$

### 7.4.1 Application

Count Data and Poisson regression

data(bioChemists, package = "pscl")
bioChemists <- bioChemists %>%
rename(
Num_Article = art, #articles in last 3 years of PhD
Sex = fem, #coded 1 if female
Married = mar, #coded 1 if married
Num_Kid5 = kid5, #number of childeren under age 6
PhD_Quality = phd, #prestige of PhD program
Num_MentArticle = ment #articles by mentor in last 3 years
)
hist(bioChemists$Num_Article, breaks = 25, main = 'Number of Articles') Poisson_Mod <- glm(Num_Article ~ ., family=poisson, bioChemists) summary(Poisson_Mod) #> #> Call: #> glm(formula = Num_Article ~ ., family = poisson, data = bioChemists) #> #> Deviance Residuals: #> Min 1Q Median 3Q Max #> -3.5672 -1.5398 -0.3660 0.5722 5.4467 #> #> Coefficients: #> Estimate Std. Error z value Pr(>|z|) #> (Intercept) 0.304617 0.102981 2.958 0.0031 ** #> SexWomen -0.224594 0.054613 -4.112 3.92e-05 *** #> MarriedMarried 0.155243 0.061374 2.529 0.0114 * #> Num_Kid5 -0.184883 0.040127 -4.607 4.08e-06 *** #> PhD_Quality 0.012823 0.026397 0.486 0.6271 #> Num_MentArticle 0.025543 0.002006 12.733 < 2e-16 *** #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #> #> (Dispersion parameter for poisson family taken to be 1) #> #> Null deviance: 1817.4 on 914 degrees of freedom #> Residual deviance: 1634.4 on 909 degrees of freedom #> AIC: 3314.1 #> #> Number of Fisher Scoring iterations: 5 Residual of 1634 with 909 df isn’t great. We see Pearson $$\chi^2$$ Predicted_Means <- predict(Poisson_Mod,type = "response") X2 <- sum((bioChemists$Num_Article - Predicted_Means)^2/Predicted_Means)
X2
#> [1] 1662.547
pchisq(X2,Poisson_Mod$df.residual, lower.tail = FALSE) #> [1] 7.849882e-47 With interaction terms, there are some improvements Poisson_Mod_All2way <- glm(Num_Article ~ .^2, family=poisson, bioChemists) Poisson_Mod_All3way <- glm(Num_Article ~ .^3, family=poisson, bioChemists) Consider the $$\hat{\phi} = \frac{\text{deviance}}{df}$$ Poisson_Mod$deviance / Poisson_Mod$df.residual #> [1] 1.797988 This is evidence for over-dispersion. Likely cause is missing variables. And remedies could either be to include more variables or consider random effects. A quick fix is to force the Poisson Regression to include this value of $$\phi$$, and this model is called “Quasi-Poisson”. phi_hat = Poisson_Mod$deviance/Poisson_Mod\$df.residual
summary(Poisson_Mod,dispersion = phi_hat)
#>
#> Call:
#> glm(formula = Num_Article ~ ., family = poisson, data = bioChemists)
#>
#> Deviance Residuals:
#>     Min       1Q   Median       3Q      Max
#> -3.5672  -1.5398  -0.3660   0.5722   5.4467
#>
#> Coefficients:
#>                 Estimate Std. Error z value Pr(>|z|)
#> (Intercept)      0.30462    0.13809   2.206  0.02739 *
#> SexWomen        -0.22459    0.07323  -3.067  0.00216 **
#> MarriedMarried   0.15524    0.08230   1.886  0.05924 .
#> Num_Kid5        -0.18488    0.05381  -3.436  0.00059 ***
#> PhD_Quality      0.01282    0.03540   0.362  0.71715
#> Num_MentArticle  0.02554    0.00269   9.496  < 2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for poisson family taken to be 1.797988)
#>
#>     Null deviance: 1817.4  on 914  degrees of freedom
#> Residual deviance: 1634.4  on 909  degrees of freedom
#> AIC: 3314.1
#>
#> Number of Fisher Scoring iterations: 5

Or directly rerun the model as

quasiPoisson_Mod <- glm(Num_Article ~ ., family=quasipoisson, bioChemists)

Quasi-Poisson is not recommended, but Negative Binomial Regression that has an extra parameter to account for over-dispersion is.