5 Modelación: Regresión Poisson
## 'data.frame': 744 obs. of 11 variables:
## $ country : Factor w/ 2 levels "Mexico","United States": 1 1 1 1 1 1 1 1 1 1 ...
## $ year : Factor w/ 31 levels "1985","1986",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ sex : Factor w/ 2 levels "female","male": 2 2 2 2 2 1 1 1 1 1 ...
## $ age : Factor w/ 6 levels "5-14 years","15-24 years",..: 6 5 3 4 2 2 6 3 4 5 ...
## $ suicides : int 44 145 340 327 375 107 7 61 55 15 ...
## $ population : int 432000 2330000 5679000 5836000 8420000 8211000 563000 5661000 6100000 2651000 ...
## $ no_suicides: int 431956 2329855 5678660 5835673 8419625 8210893 562993 5660939 6099945 2650985 ...
## $ rate : num 0.000102 0.000062 0.00006 0.000056 0.000045 0.000013 0.000012 0.000011 0.000009 0.000006 ...
## $ HDI : num 0.634 0.634 0.634 0.634 0.634 0.634 0.634 0.634 0.634 0.634 ...
## $ GDP_PP : num 2730 2730 2730 2730 2730 2730 2730 2730 2730 2730 ...
## $ generation : Factor w/ 6 levels "G.I. Generation",..: 1 1 3 2 4 4 1 3 2 1 ...
En este modelo los coeficientes de cada categoría representan el logaritmo del cociente de las esperanzas de los conteos dada dicha varible entre la categoría basal. Es decir, en este modelo comparamos directamente las respuestas(número esperado de conteos) de cada categoría contra la categoría basal.
Comencemos modelando con una sola variable explicativa
5.1 País
Estimate | Std. Error | z value | Pr(>|z|) | |
---|---|---|---|---|
(Intercept) | -10.124644 | 0.0029996 | -3375.318 | 0 |
countryUnited States | 1.164164 | 0.0031567 | 368.791 | 0 |
El país
resulta muy significativo.
Además, tenemos el coeficiente positivo por lo que parece que en Estados Unidos
hay un mayor número de suicidios.
5.2 Edad
Estimate | Std. Error | z value | Pr(>|z|) | |
---|---|---|---|---|
(Intercept) | -11.903246 | 0.0088206 | -1349.4837 | 0 |
age15-24 years | 2.628727 | 0.0091382 | 287.6649 | 0 |
age25-34 years | 2.852436 | 0.0090874 | 313.8883 | 0 |
age35-54 years | 3.009767 | 0.0089575 | 336.0060 | 0 |
age55-74 years | 3.014433 | 0.0090565 | 332.8464 | 0 |
age75+ years | 3.265944 | 0.0093720 | 348.4784 | 0 |
La edad
también es muy significativa para todos los grupos de edad.
Nuestra categoría basal es el grupo de edad más joven, de 5 a 14 años de edad.
Todos los grupos de edad tienen un número de suicidios esperado mayor a la categoría basal. El grupo que presenta mayor número de suicidios es el de mayores de 75 años.
5.3 Sexo
Estimate | Std. Error | z value | Pr(>|z|) | |
---|---|---|---|---|
(Intercept) | -10.070705 | 0.0020711 | -4862.5032 | 0 |
sexmale | 1.400189 | 0.0023208 | 603.3338 | 0 |
El sexo
es muy significativo. Como habíamos dicho antes, los hombres tienen un mayor número esperado de suicidios que las mujeres.
5.4 Año
Estimate | Std. Error | z value | Pr(>|z|) | |
---|---|---|---|---|
(Intercept) | -9.1380542 | 0.0056805 | -1608.6710865 | 0.0000000 |
year1986 | 0.0377794 | 0.0079333 | 4.7621491 | 0.0000019 |
year1987 | 0.0305460 | 0.0079405 | 3.8468453 | 0.0001196 |
year1988 | 0.0051457 | 0.0079642 | 0.6461040 | 0.5182120 |
year1989 | -0.0079733 | 0.0079647 | -1.0010814 | 0.3167874 |
year1990 | 0.0075386 | 0.0079206 | 0.9517706 | 0.3412133 |
year1991 | -0.0057332 | 0.0079164 | -0.7242246 | 0.4689278 |
year1992 | -0.0213964 | 0.0079272 | -2.6990975 | 0.0069528 |
year1993 | -0.0126829 | 0.0078857 | -1.6083338 | 0.1077621 |
year1994 | -0.0170310 | 0.0078697 | -2.1641383 | 0.0304537 |
year1995 | -0.0169028 | 0.0078455 | -2.1544657 | 0.0312037 |
year1996 | -0.0451564 | 0.0078599 | -5.7451771 | 0.0000000 |
year1997 | -0.0578838 | 0.0078610 | -7.3634232 | 0.0000000 |
year1998 | -0.0706852 | 0.0078608 | -8.9920944 | 0.0000000 |
year1999 | -0.1230245 | 0.0079390 | -15.4962411 | 0.0000000 |
year2000 | -0.1427664 | 0.0079219 | -18.0217228 | 0.0000000 |
year2001 | -0.1088832 | 0.0078336 | -13.8994790 | 0.0000000 |
year2002 | -0.0902898 | 0.0077756 | -11.6119398 | 0.0000000 |
year2003 | -0.0981956 | 0.0077717 | -12.6349883 | 0.0000000 |
year2004 | -0.0830200 | 0.0077251 | -10.7467803 | 0.0000000 |
year2005 | -0.0825488 | 0.0077049 | -10.7137525 | 0.0000000 |
year2006 | -0.0782110 | 0.0076759 | -10.1892001 | 0.0000000 |
year2007 | -0.0518712 | 0.0076149 | -6.8117566 | 0.0000000 |
year2008 | -0.0161730 | 0.0075402 | -2.1449020 | 0.0319607 |
year2009 | 0.0020221 | 0.0074914 | 0.2699270 | 0.7872165 |
year2010 | 0.0209749 | 0.0074435 | 2.8178894 | 0.0048340 |
year2011 | 0.0520530 | 0.0073795 | 7.0536919 | 0.0000000 |
year2012 | 0.0615009 | 0.0073497 | 8.3678151 | 0.0000000 |
year2013 | 0.0700969 | 0.0073219 | 9.5735264 | 0.0000000 |
year2014 | 0.1038015 | 0.0072594 | 14.2989956 | 0.0000000 |
year2015 | 0.1238268 | 0.0072181 | 17.1551561 | 0.0000000 |
Para esta variable tenemos algunos años que no son significativos pero la mayoría lo son. Por lo que la podemos tomar como una variable significativa.
Además, notamos que para los siguientes dos años posteriores al año basal (1985) el número de suicidios crece significativamente, después hay algunos años con diferencias no significativas y apartir de 1992 y hasta el 2000 disminuyen los suicidios cada año un poco más, después del 2000, siguen siendo menor al años basal pero en menor medida y a partir del 2009 el número de suicidios empieza a aumentar.
5.5 Índice de Desarrollo Humano
Estimate | Std. Error | z value | Pr(>|z|) | |
---|---|---|---|---|
(Intercept) | -14.043595 | 0.0135578 | -1035.8328 | 0 |
HDI | 5.713386 | 0.0155631 | 367.1103 | 0 |
Esta variable es muy siginificativa.
Parece que el número de suicidios aumenta conforme el IDH aumenta, recordamos que en nuestra base sólo tenemos a México y Estados Unidos, de los cuáles encontramos más suicidios en E.U. que también es donde es mayor el HDI.
5.6 PIB per cápita
Estimate | Std. Error | z value | Pr(>|z|) | |
---|---|---|---|---|
(Intercept) | -9.7640178 | 0.0022597 | -4320.9063 | 0 |
GDP_PP | 0.0000175 | 0.0000001 | 318.9235 | 0 |
También el PIB es muy significativo, aunque el coeficiente es muy pequeño, de igual forma el número esperado de suicidios aumenta conforme el PIB per cápita aumenta.
5.7 Generación
Estimate | Std. Error | z value | Pr(>|z|) | |
---|---|---|---|---|
(Intercept) | -8.6235037 | 0.0032855 | -2624.68254 | 0 |
generationSilent | -0.2985983 | 0.0038637 | -77.28243 | 0 |
generationBoomers | -0.2872374 | 0.0036641 | -78.39278 | 0 |
generationGeneration X | -0.5571358 | 0.0037705 | -147.76267 | 0 |
generationMillenials | -1.1129118 | 0.0042750 | -260.32825 | 0 |
generationGeneration Z | -3.1345317 | 0.0151720 | -206.59992 | 0 |
La variable generación es muy significativa.
Nuestra categoría basal es la ganeración G.I., nacidos entre 1901 y 1926, quienes vivieron la segunda guerra mundial. Notamos que la esta generación es la que presenta un número esperado de suicidios mayor al de las demás generaciones, cuyos coeficientes son negativos.
Comparamos los modelos con una categoría.
df | AIC | |
---|---|---|
fit1 | 2 | 845407.6 |
fit2 | 6 | 656753.5 |
fit3 | 2 | 575387.5 |
fit4 | 31 | 1025387.5 |
fit5 | 2 | 856320.9 |
fit6 | 2 | 924157.7 |
fit7 | 6 | 829440.2 |
El mejor es fit3 el modelo que tiene como variable explicativa al sexo. Seguido de fit 2, el modelo que tiene como variiable explicativa la edad.
Ya que todas nuestras variables son significativas, parecería adecuado explorar los modelos con las distintas combinaciones de covariables, sin embargo estás son demasiadas, por lo que empezaremos ajustando un modelo con las dos variables que ajustaron los mejores modelos (sexo y edad), e iremos aumentando o disminuyendo variables hasta encontrar nuetro mejor modelo.
5.8 Sexo y edad
Estimate | Std. Error | z value | Pr(>|z|) | |
---|---|---|---|---|
(Intercept) | -12.875069 | 0.0090217 | -1427.1192 | 0 |
sexmale | 1.438935 | 0.0023250 | 618.8967 | 0 |
age15-24 years | 2.630663 | 0.0091381 | 287.8783 | 0 |
age25-34 years | 2.864418 | 0.0090874 | 315.2082 | 0 |
age35-54 years | 3.030601 | 0.0089575 | 338.3325 | 0 |
age55-74 years | 3.065654 | 0.0090567 | 338.4965 | 0 |
age75+ years | 3.437438 | 0.0093744 | 366.6817 | 0 |
Las dos variables siguen siendo muy significativas. Veamos si este modelo es mejor a nuestro mejor modelo anterior.
## Analysis of Deviance Table
##
## Model 1: suicides ~ offset(log(population)) + sex
## Model 2: suicides ~ offset(log(population)) + sex + age
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 742 569487
## 2 737 170646 5 398841 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Nustro modelo con dos variables tiene una devianza menor y esta diferencia es significativa, por lo que nuestro modelo con sexo y edad es mejor.
Agreguemos ahora la variable país.
5.9 País, sexo y edad
Estimate | Std. Error | z value | Pr(>|z|) | |
---|---|---|---|---|
(Intercept) | -13.613519 | 0.0093960 | -1448.8668 | 0 |
countryUnited States | 1.000998 | 0.0031803 | 314.7537 | 0 |
sexmale | 1.438202 | 0.0023253 | 618.4903 | 0 |
age15-24 years | 2.607588 | 0.0091382 | 285.3515 | 0 |
age25-34 years | 2.799227 | 0.0090885 | 307.9953 | 0 |
age35-54 years | 2.908300 | 0.0089616 | 324.5286 | 0 |
age55-74 years | 2.904416 | 0.0090636 | 320.4471 | 0 |
age75+ years | 3.251569 | 0.0093834 | 346.5254 | 0 |
Las tres variables siguen siendo muy siginificativas, comparamos con nuestro moelo anterior.
## Analysis of Deviance Table
##
## Model 1: suicides ~ offset(log(population)) + sex + age
## Model 2: suicides ~ offset(log(population)) + country + sex + age
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 737 170646
## 2 736 42567 1 128080 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
La devianza es menos significativamente, por lo que nos quedamos con este modelo con país, sexo y edad.
Agregamos la variable generación
5.9.1 País, sexo, edad y generación
Estimate | Std. Error | z value | Pr(>|z|) | |
---|---|---|---|---|
(Intercept) | -13.6490292 | 0.0129936 | -1050.440214 | 0 |
countryUnited States | 1.0031640 | 0.0031832 | 315.146227 | 0 |
sexmale | 1.4388477 | 0.0023256 | 618.703577 | 0 |
age15-24 years | 2.6776289 | 0.0112683 | 237.625664 | 0 |
age25-34 years | 2.8913300 | 0.0114088 | 253.428896 | 0 |
age35-54 years | 3.0486067 | 0.0115820 | 263.218880 | 0 |
age55-74 years | 3.0960843 | 0.0120153 | 257.677769 | 0 |
age75+ years | 3.4047366 | 0.0125190 | 271.966592 | 0 |
generationSilent | -0.2231725 | 0.0041014 | -54.413990 | 0 |
generationBoomers | -0.1078096 | 0.0049562 | -21.752590 | 0 |
generationGeneration X | -0.0363467 | 0.0055839 | -6.509169 | 0 |
generationMillenials | -0.0373800 | 0.0065093 | -5.742518 | 0 |
generationGeneration Z | 0.1754166 | 0.0194161 | 9.034587 | 0 |
Todas las variables son muy significativas. Comparemos con el modelo anterior.
## Analysis of Deviance Table
##
## Model 1: suicides ~ offset(log(population)) + country + sex + age
## Model 2: suicides ~ offset(log(population)) + country + sex + age + generation
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 736 42567
## 2 731 37978 5 4588.4 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
La devianza del modelo agregando la variable generación es menor significativamente por lo que nos quedamos con este último modelo.
Agregamos la variable Índice de Desarrollo Humano.
5.10 País, sexo, edad, generación e IDH
Estimate | Std. Error | z value | Pr(>|z|) | |
---|---|---|---|---|
(Intercept) | -14.6183504 | 0.0423922 | -344.836051 | 0e+00 |
countryUnited States | 0.6893334 | 0.0134003 | 51.441537 | 0e+00 |
sexmale | 1.4386518 | 0.0023256 | 618.618311 | 0e+00 |
age15-24 years | 2.6400939 | 0.0113752 | 232.092524 | 0e+00 |
age25-34 years | 2.8153754 | 0.0118383 | 237.818806 | 0e+00 |
age35-54 years | 2.9232642 | 0.0126985 | 230.204816 | 0e+00 |
age55-74 years | 2.9115193 | 0.0142547 | 204.249205 | 0e+00 |
age75+ years | 3.1892618 | 0.0153917 | 207.206051 | 0e+00 |
generationSilent | -0.2887224 | 0.0049248 | -58.625737 | 0e+00 |
generationBoomers | -0.2295078 | 0.0070868 | -32.385284 | 0e+00 |
generationGeneration X | -0.2126608 | 0.0092145 | -23.078838 | 0e+00 |
generationMillenials | -0.2773612 | 0.0119105 | -23.287203 | 0e+00 |
generationGeneration Z | -0.1208010 | 0.0229865 | -5.255299 | 1e-07 |
HDI | 1.7295448 | 0.0719055 | 24.053041 | 0e+00 |
Todas las variables siguen siendo muy significativas. Veamos si este modelo es mejor.
## Analysis of Deviance Table
##
## Model 1: suicides ~ offset(log(population)) + country + sex + age + generation
## Model 2: suicides ~ offset(log(population)) + country + sex + age + generation +
## HDI
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 731 37978
## 2 730 37399 1 579.36 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
El modelo agregando esta última variable es mejor.
Agregamos la variable año
5.11 País, sexo, edad, generación, IDH y año
Estimate | Std. Error | z value | Pr(>|z|) | |
---|---|---|---|---|
(Intercept) | -19.2172906 | 0.1114625 | -172.4104381 | 0.0000000 |
countryUnited States | -0.5868616 | 0.0302579 | -19.3953392 | 0.0000000 |
sexmale | 1.4387116 | 0.0023256 | 618.6412868 | 0.0000000 |
age15-24 years | 2.6203252 | 0.0114461 | 228.9275001 | 0.0000000 |
age25-34 years | 2.7846248 | 0.0121409 | 229.3584446 | 0.0000000 |
age35-54 years | 2.8945350 | 0.0135419 | 213.7465817 | 0.0000000 |
age55-74 years | 2.8934700 | 0.0160639 | 180.1229613 | 0.0000000 |
age75+ years | 3.1997835 | 0.0176661 | 181.1261420 | 0.0000000 |
generationSilent | -0.2254443 | 0.0055648 | -40.5122568 | 0.0000000 |
generationBoomers | -0.1501932 | 0.0088299 | -17.0096855 | 0.0000000 |
generationGeneration X | -0.1421728 | 0.0118906 | -11.9567855 | 0.0000000 |
generationMillenials | -0.2203530 | 0.0154420 | -14.2697500 | 0.0000000 |
generationGeneration Z | -0.1065486 | 0.0256691 | -4.1508544 | 0.0000331 |
HDI | 8.7742983 | 0.1672272 | 52.4693236 | 0.0000000 |
year1986 | 0.0054581 | 0.0079554 | 0.6860863 | 0.4926587 |
year1987 | -0.0407629 | 0.0080289 | -5.0770161 | 0.0000004 |
year1988 | -0.0959365 | 0.0081598 | -11.7571987 | 0.0000000 |
year1989 | -0.1390575 | 0.0083080 | -16.7377746 | 0.0000000 |
year1990 | -0.1512045 | 0.0084522 | -17.8893776 | 0.0000000 |
year1991 | -0.1721961 | 0.0089947 | -19.1442750 | 0.0000000 |
year1992 | -0.2205320 | 0.0092540 | -23.8309896 | 0.0000000 |
year1993 | -0.2418127 | 0.0094937 | -25.4708344 | 0.0000000 |
year1994 | -0.2765086 | 0.0097819 | -28.2674409 | 0.0000000 |
year1995 | -0.3086590 | 0.0102302 | -30.1713858 | 0.0000000 |
year1996 | -0.3496106 | 0.0103973 | -33.6252300 | 0.0000000 |
year1997 | -0.3785865 | 0.0105659 | -35.8309483 | 0.0000000 |
year1998 | -0.4064891 | 0.0107412 | -37.8439143 | 0.0000000 |
year1999 | -0.4752176 | 0.0109819 | -43.2727498 | 0.0000000 |
year2000 | -0.5142016 | 0.0111656 | -46.0523820 | 0.0000000 |
year2001 | -0.4771951 | 0.0116910 | -40.8172668 | 0.0000000 |
year2002 | -0.4875331 | 0.0119901 | -40.6613831 | 0.0000000 |
year2003 | -0.5233097 | 0.0123372 | -42.4172016 | 0.0000000 |
year2004 | -0.5366143 | 0.0126688 | -42.3570377 | 0.0000000 |
year2005 | -0.5651911 | 0.0130280 | -43.3827474 | 0.0000000 |
year2006 | -0.5855639 | 0.0133497 | -43.8634505 | 0.0000000 |
year2007 | -0.5832005 | 0.0136699 | -42.6632051 | 0.0000000 |
year2008 | -0.5685501 | 0.0139866 | -40.6497153 | 0.0000000 |
year2009 | -0.5743478 | 0.0143265 | -40.0899383 | 0.0000000 |
year2010 | -0.5824735 | 0.0150638 | -38.6671450 | 0.0000000 |
year2011 | -0.5744496 | 0.0158345 | -36.2784661 | 0.0000000 |
year2012 | -0.5807136 | 0.0160515 | -36.1781683 | 0.0000000 |
year2013 | -0.5823441 | 0.0161768 | -35.9987691 | 0.0000000 |
year2014 | -0.5665334 | 0.0164007 | -34.5432536 | 0.0000000 |
year2015 | -0.5644844 | 0.0166370 | -33.9293956 | 0.0000000 |
Notamos que el año en prescencia de las demás variables, sólo no es significativo para el año 1986 pero para el resto sí lo es, ganó significancia, a cuando sólo considerábamos esta variable por sí sola.
Comparamos este modelo con el anterior.
## Analysis of Deviance Table
##
## Model 1: suicides ~ offset(log(population)) + country + sex + age + generation +
## HDI
## Model 2: suicides ~ offset(log(population)) + country + sex + age + generation +
## HDI + year
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 730 37399
## 2 700 31722 30 5677 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
El ajuste es mejor con la variable año.
Sólo nos falta por agregar la variable PIB per cápita
5.12 País, sexo, edad, generación, HDI, año y PIB per cápita
Estimate | Std. Error | z value | Pr(>|z|) | |
---|---|---|---|---|
(Intercept) | -16.6556997 | 0.4610498 | -36.1255974 | 0.0000000 |
countryUnited States | 0.4481359 | 0.1833289 | 2.4444370 | 0.0145078 |
sexmale | 1.4387170 | 0.0023256 | 618.6435232 | 0.0000000 |
age15-24 years | 2.6202726 | 0.0114463 | 228.9180588 | 0.0000000 |
age25-34 years | 2.7847869 | 0.0121412 | 229.3661704 | 0.0000000 |
age35-54 years | 2.8949123 | 0.0135425 | 213.7657481 | 0.0000000 |
age55-74 years | 2.8943485 | 0.0160650 | 180.1644114 | 0.0000000 |
age75+ years | 3.2009715 | 0.0176671 | 181.1823770 | 0.0000000 |
generationSilent | -0.2249144 | 0.0055656 | -40.4115624 | 0.0000000 |
generationBoomers | -0.1491161 | 0.0088319 | -16.8838814 | 0.0000000 |
generationGeneration X | -0.1408823 | 0.0118926 | -11.8461919 | 0.0000000 |
generationMillenials | -0.2187157 | 0.0154444 | -14.1615111 | 0.0000000 |
generationGeneration Z | -0.1049802 | 0.0256709 | -4.0894646 | 0.0000432 |
HDI | 4.6990815 | 0.7311640 | 6.4268498 | 0.0000000 |
year1986 | 0.0266060 | 0.0087739 | 3.0323924 | 0.0024262 |
year1987 | 0.0032438 | 0.0111215 | 0.2916661 | 0.7705420 |
year1988 | -0.0254336 | 0.0147817 | -1.7206119 | 0.0853213 |
year1989 | -0.0415001 | 0.0189677 | -2.1879370 | 0.0286742 |
year1990 | -0.0282399 | 0.0230906 | -1.2230048 | 0.2213279 |
year1991 | -0.0311078 | 0.0262442 | -1.1853205 | 0.2358908 |
year1992 | -0.0549938 | 0.0303698 | -1.8108074 | 0.0701707 |
year1993 | -0.0521903 | 0.0344671 | -1.5142067 | 0.1299734 |
year1994 | -0.0609146 | 0.0389194 | -1.5651471 | 0.1175484 |
year1995 | -0.0714292 | 0.0426859 | -1.6733689 | 0.0942547 |
year1996 | -0.0940997 | 0.0458342 | -2.0530431 | 0.0400684 |
year1997 | -0.1018983 | 0.0494837 | -2.0592298 | 0.0394722 |
year1998 | -0.1106329 | 0.0527965 | -2.0954608 | 0.0361300 |
year1999 | -0.1567306 | 0.0567198 | -2.7632437 | 0.0057230 |
year2000 | -0.1786367 | 0.0596812 | -2.9931825 | 0.0027608 |
year2001 | -0.1230923 | 0.0629593 | -1.9551093 | 0.0505702 |
year2002 | -0.1148820 | 0.0661989 | -1.7354058 | 0.0826689 |
year2003 | -0.1265220 | 0.0704128 | -1.7968599 | 0.0723579 |
year2004 | -0.1089463 | 0.0757858 | -1.4375550 | 0.1505604 |
year2005 | -0.1051253 | 0.0814282 | -1.2910179 | 0.1966975 |
year2006 | -0.0969002 | 0.0864126 | -1.1213666 | 0.2621319 |
year2007 | -0.0686381 | 0.0909329 | -0.7548216 | 0.4503560 |
year2008 | -0.0386615 | 0.0936277 | -0.4129278 | 0.6796595 |
year2009 | -0.0474718 | 0.0931580 | -0.5095837 | 0.6103431 |
year2010 | -0.0326547 | 0.0972298 | -0.3358504 | 0.7369837 |
year2011 | -0.0047312 | 0.1007848 | -0.0469438 | 0.9625580 |
year2012 | 0.0084959 | 0.1041805 | 0.0815498 | 0.9350047 |
year2013 | 0.0214981 | 0.1067280 | 0.2014289 | 0.8403632 |
year2014 | 0.0599923 | 0.1106800 | 0.5420336 | 0.5877953 |
year2015 | 0.0816702 | 0.1140904 | 0.7158372 | 0.4740919 |
GDP_PP | -0.0000085 | 0.0000015 | -5.7236489 | 0.0000000 |
El año perdió significancia en muchas entradas, veamos si nuestro modelo mejoró.
## Analysis of Deviance Table
##
## Model 1: suicides ~ offset(log(population)) + country + sex + age + generation +
## HDI + year
## Model 2: suicides ~ offset(log(population)) + country + sex + age + generation +
## HDI + year + GDP_PP
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 700 31722
## 2 699 31689 1 32.709 1.071e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Aunque el año perdió significancia, la devianza del modelo con todas las variables es significativamente mejor, por lo que es mejor a nuestro modelo anterior. Sin embargo, puede que el PIB y el año expliquen lo mismo por lo que quitaremos la variable año para ver si nuestro modelo mejora.
5.13 País, sexo, edad, generación, IDH y PIB per cápita
Estimate | Std. Error | z value | Pr(>|z|) | |
---|---|---|---|---|
(Intercept) | -16.4600008 | 0.0615673 | -267.349745 | 0.0000000 |
countryUnited States | 0.5186118 | 0.0139444 | 37.191528 | 0.0000000 |
sexmale | 1.4390149 | 0.0023256 | 618.763576 | 0.0000000 |
age15-24 years | 2.6608324 | 0.0113880 | 233.651731 | 0.0000000 |
age25-34 years | 2.8581583 | 0.0118861 | 240.462203 | 0.0000000 |
age35-54 years | 3.0008500 | 0.0128374 | 233.759095 | 0.0000000 |
age55-74 years | 3.0363540 | 0.0145685 | 208.419821 | 0.0000000 |
age75+ years | 3.3365358 | 0.0157959 | 211.227393 | 0.0000000 |
generationSilent | -0.2345718 | 0.0050878 | -46.104462 | 0.0000000 |
generationBoomers | -0.1334297 | 0.0074459 | -17.919888 | 0.0000000 |
generationGeneration X | -0.0764717 | 0.0097732 | -7.824671 | 0.0000000 |
generationMillenials | -0.1022473 | 0.0126248 | -8.098932 | 0.0000000 |
generationGeneration Z | 0.0832918 | 0.0235083 | 3.543077 | 0.0003955 |
HDI | 4.1570853 | 0.0929615 | 44.718378 | 0.0000000 |
GDP_PP | -0.0000078 | 0.0000002 | -42.150066 | 0.0000000 |
Todas las variables son muy significativas. Veamos si es mejor a nuestro modelo con todas las variables.
## Analysis of Deviance Table
##
## Model 1: suicides ~ offset(log(population)) + country + sex + age + generation +
## HDI + GDP_PP
## Model 2: suicides ~ offset(log(population)) + country + sex + age + generation +
## HDI + year + GDP_PP
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 729 35609
## 2 699 31689 30 3920.2 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Sigue siendo mejor el modelo con todas nuestras variables por lo que este será nuestro modelo final.
country | year | sex | age | HDI | GDP_PP | |
---|---|---|---|---|---|---|
80 | Mexico | 1991 | female | 25-34 years | 0.6524 | 4204 |
89 | Mexico | 1992 | male | 15-24 years | 0.6568 | 4830 |
110 | Mexico | 1994 | male | 55-74 years | 0.6656 | 6735 |
134 | Mexico | 1996 | male | 55-74 years | 0.6758 | 4904 |
156 | Mexico | 1997 | female | 5-14 years | 0.6816 | 5864 |
177 | Mexico | 1999 | female | 75+ years | 0.6932 | 6800 |
240 | Mexico | 2004 | female | 5-14 years | 0.7174 | 8217 |
596 | United States | 2003 | female | 25-34 years | 0.8914 | 42468 |
700 | United States | 2012 | male | 25-34 years | 0.9120 | 55170 |
713 | United States | 2013 | male | 15-24 years | 0.9130 | 56520 |
generation | suicides | rate | ajustados | tasas_ajust | residuos | |
---|---|---|---|---|---|---|
80 | Boomers | 72 | 0.000011 | 109.09713 | 0.0000164 | 37.097135 |
89 | Generation X | 554 | 0.000058 | 559.53608 | 0.0000585 | 5.536078 |
110 | Silent | 299 | 0.000097 | 220.99507 | 0.0000721 | 78.004933 |
134 | Silent | 309 | 0.000092 | 250.12176 | 0.0000743 | 58.878244 |
156 | Millenials | 30 | 0.000003 | 10.95100 | 0.0000010 | 19.049000 |
177 | G.I. Generation | 9 | 0.000008 | 32.57623 | 0.0000301 | 23.576234 |
240 | Millenials | 50 | 0.000004 | 13.02161 | 0.0000011 | 36.978392 |
596 | Generation X | 909 | 0.000046 | 1023.69955 | 0.0000521 | 114.699554 |
700 | Millenials | 4985 | 0.000237 | 4835.24953 | 0.0002299 | 149.750466 |
713 | Millenials | 3903 | 0.000172 | 4462.52674 | 0.0001963 | 559.526735 |
Notamos que el ajuste no es tan bueno, pero tampoco tan malo, en algunas entradas sí ajusta bien.
Verifiquemos la dispersión de nuestro modelo
##
## Overdispersion test
##
## data: fit
## z = 16.492, p-value < 2.2e-16
## alternative hypothesis: true dispersion is greater than 1
## sample estimates:
## dispersion
## 43.46441
La prueba nos arroja que nuestro modelo tiene sobredispercion, entonces en estos casos es recomendable, tratar de ajustar un modelo binomial negativo, lo cual se realizará a continuación.
Ajustaremos un modelo binomial negativo dada la variabilidad de los datos para ver si el ajuste mejora.
Estimate | Std. Error | z value | Pr(>|z|) | |
---|---|---|---|---|
(Intercept) | -17.1740708 | 3.8910546 | -4.4137316 | 0.0000102 |
countryUnited States | 0.3834412 | 1.5392651 | 0.2491067 | 0.8032783 |
sexmale | 1.5543579 | 0.0269350 | 57.7077119 | 0.0000000 |
age15-24 years | 2.4024892 | 0.0598616 | 40.1340591 | 0.0000000 |
age25-34 years | 2.4858681 | 0.0814096 | 30.5353317 | 0.0000000 |
age35-54 years | 2.5918253 | 0.1175649 | 22.0459144 | 0.0000000 |
age55-74 years | 2.4857642 | 0.1667491 | 14.9072102 | 0.0000000 |
age75+ years | 2.6560126 | 0.1909649 | 13.9083836 | 0.0000000 |
generationSilent | -0.3263373 | 0.0732143 | -4.4572869 | 0.0000083 |
generationBoomers | -0.2911118 | 0.1237396 | -2.3526170 | 0.0186418 |
generationGeneration X | -0.2882807 | 0.1650687 | -1.7464289 | 0.0807365 |
generationMillenials | -0.1881089 | 0.2095196 | -0.8978107 | 0.3692865 |
generationGeneration Z | 0.1585504 | 0.2534192 | 0.6256447 | 0.5315480 |
HDI | 5.8426733 | 6.1277764 | 0.9534736 | 0.3403501 |
year1986 | 0.0082659 | 0.1093544 | 0.0755881 | 0.9397468 |
year1987 | -0.0213339 | 0.1178133 | -0.1810821 | 0.8563031 |
year1988 | -0.0336123 | 0.1345607 | -0.2497928 | 0.8027476 |
year1989 | -0.0363630 | 0.1564873 | -0.2323702 | 0.8162505 |
year1990 | -0.0365614 | 0.1805072 | -0.2025479 | 0.8394884 |
year1991 | -0.0003516 | 0.2077094 | -0.0016925 | 0.9986496 |
year1992 | 0.0222974 | 0.2384412 | 0.0935132 | 0.9254959 |
year1993 | 0.0164845 | 0.2753418 | 0.0598693 | 0.9522598 |
year1994 | 0.0441734 | 0.3072972 | 0.1437481 | 0.8856994 |
year1995 | 0.0339985 | 0.3247537 | 0.1046901 | 0.9166217 |
year1996 | 0.0013722 | 0.3556533 | 0.0038583 | 0.9969215 |
year1997 | 0.0184909 | 0.3920212 | 0.0471681 | 0.9623792 |
year1998 | 0.0086811 | 0.4231744 | 0.0205143 | 0.9836331 |
year1999 | -0.0578193 | 0.4600542 | -0.1256794 | 0.8999857 |
year2000 | -0.0505941 | 0.4944097 | -0.1023323 | 0.9184929 |
year2001 | 0.0697063 | 0.5252797 | 0.1327033 | 0.8944281 |
year2002 | 0.0430355 | 0.5525888 | 0.0778798 | 0.9379237 |
year2003 | 0.0218304 | 0.5813056 | 0.0375541 | 0.9700432 |
year2004 | 0.0405570 | 0.6208987 | 0.0653199 | 0.9479193 |
year2005 | 0.0438740 | 0.6640154 | 0.0660737 | 0.9473191 |
year2006 | 0.0304039 | 0.7044856 | 0.0431576 | 0.9655759 |
year2007 | -0.0320931 | 0.7419895 | -0.0432528 | 0.9655000 |
year2008 | 0.0265091 | 0.7690903 | 0.0344681 | 0.9725038 |
year2009 | 0.0229464 | 0.7676326 | 0.0298924 | 0.9761529 |
year2010 | 0.0029875 | 0.8067810 | 0.0037030 | 0.9970455 |
year2011 | 0.0780691 | 0.8354387 | 0.0934469 | 0.9255485 |
year2012 | 0.0426644 | 0.8670690 | 0.0492053 | 0.9607557 |
year2013 | 0.0683691 | 0.8845063 | 0.0772964 | 0.9383878 |
year2014 | 0.1631208 | 0.9072604 | 0.1797949 | 0.8573136 |
year2015 | 0.1669786 | 0.9192707 | 0.1816425 | 0.8558633 |
GDP_PP | -0.0000136 | 0.0000125 | -1.0857610 | 0.2775848 |
La variable año pierde significancia, por lo que quitaremos esta variable para ver si mejora el ajuste.
Estimate | Std. Error | z value | Pr(>|z|) | |
---|---|---|---|---|
(Intercept) | -17.4416104 | 0.4221359 | -41.3175276 | 0.0000000 |
countryUnited States | 0.3229737 | 0.1522094 | 2.1219039 | 0.0338458 |
sexmale | 1.5536646 | 0.0271092 | 57.3112721 | 0.0000000 |
age15-24 years | 2.4227215 | 0.0588212 | 41.1879166 | 0.0000000 |
age25-34 years | 2.5270391 | 0.0776518 | 32.5432181 | 0.0000000 |
age35-54 years | 2.6612277 | 0.1094918 | 24.3052587 | 0.0000000 |
age55-74 years | 2.5952035 | 0.1517267 | 17.1044602 | 0.0000000 |
age75+ years | 2.7825955 | 0.1730322 | 16.0813780 | 0.0000000 |
generationSilent | -0.2960857 | 0.0694307 | -4.2644778 | 0.0000200 |
generationBoomers | -0.2147482 | 0.1126394 | -1.9065107 | 0.0565840 |
generationGeneration X | -0.1866428 | 0.1501220 | -1.2432744 | 0.2137667 |
generationMillenials | -0.0429597 | 0.1881769 | -0.2282944 | 0.8194174 |
generationGeneration Z | 0.3328724 | 0.2302734 | 1.4455529 | 0.1483026 |
HDI | 6.0483561 | 0.7573958 | 7.9857269 | 0.0000000 |
GDP_PP | -0.0000129 | 0.0000022 | -5.7733628 | 0.0000000 |
## Likelihood ratio tests of Negative Binomial Models
##
## Response: suicides
## Model
## 1 offset(log(population)) + country + sex + age + generation + HDI + GDP_PP
## 2 offset(log(population)) + country + sex + age + generation + HDI + year + GDP_PP
## theta Resid. df 2 x log-lik. Test df LR stat. Pr(Chi)
## 1 7.709266 729 -9788.206
## 2 7.814702 699 -9779.695 1 vs 2 30 8.511296 0.9999601
Primero notamos que el modelo con el año tiene devianza ligeramente menor, pero como esta diferencia no es significativa, nos quedaremos con el modelo más simple, sin la variable año.
Por otra parte notamos que la variable generación perdió significancia, por lo que la quitaremos y volveremos a comparar.
Estimate | Std. Error | z value | Pr(>|z|) | |
---|---|---|---|---|
(Intercept) | -17.9520948 | 0.4002971 | -44.846926 | 0.0000000 |
countryUnited States | 0.1906786 | 0.0858580 | 2.220859 | 0.0263605 |
sexmale | 1.5455322 | 0.0284410 | 54.341696 | 0.0000000 |
age15-24 years | 2.2439181 | 0.0497091 | 45.140959 | 0.0000000 |
age25-34 years | 2.2956995 | 0.0497397 | 46.154275 | 0.0000000 |
age35-54 years | 2.3769839 | 0.0496913 | 47.834997 | 0.0000000 |
age55-74 years | 2.3115996 | 0.0499032 | 46.321704 | 0.0000000 |
age75+ years | 2.5950172 | 0.0505784 | 51.306777 | 0.0000000 |
HDI | 6.8870478 | 0.5838346 | 11.796231 | 0.0000000 |
GDP_PP | -0.0000138 | 0.0000021 | -6.625719 | 0.0000000 |
## Likelihood ratio tests of Negative Binomial Models
##
## Response: suicides
## Model
## 1 offset(log(population)) + country + sex + age + HDI + GDP_PP
## 2 offset(log(population)) + country + sex + age + generation + HDI + GDP_PP
## theta Resid. df 2 x log-lik. Test df LR stat. Pr(Chi)
## 1 6.970030 734 -9854.207
## 2 7.709266 729 -9788.206 1 vs 2 5 66.00085 6.947776e-13
En este último modelo todas las variables son significativa, sin embargo, el modelo que contempla la generación es significativamente mejor.
Nos quedaremos con el modelo binomial negativo que contempla las variables país, sexo,edad, generación, IDH y PIB per cápita, y lo compararemos con el ajuste hecho por el modelo poisson.
country | year | sex | age | HDI | GDP_PP | |
---|---|---|---|---|---|---|
6 | Mexico | 1985 | female | 15-24 years | 0.6340 | 2730 |
59 | Mexico | 1989 | male | 5-14 years | 0.6452 | 3125 |
61 | Mexico | 1990 | male | 75+ years | 0.6480 | 3595 |
69 | Mexico | 1990 | female | 35-54 years | 0.6480 | 3595 |
288 | Mexico | 2008 | female | 5-14 years | 0.7364 | 10864 |
367 | Mexico | 2015 | female | 25-34 years | 0.7570 | 10228 |
395 | United States | 1986 | male | 5-14 years | 0.8446 | 20588 |
470 | United States | 1993 | male | 55-74 years | 0.8692 | 28891 |
594 | United States | 2003 | female | 35-54 years | 0.8914 | 42468 |
695 | United States | 2011 | male | 5-14 years | 0.9110 | 53452 |
generation | suicides | rate | ajustados | tasas_ajust | residuos | |
---|---|---|---|---|---|---|
6 | Generation X | 107 | 0.000013 | 110.02278 | 0.0000134 | 3.0227788 |
59 | Generation X | 40 | 0.000004 | 43.36102 | 0.0000041 | 3.3610234 |
61 | G.I. Generation | 87 | 0.000178 | 58.57526 | 0.0001198 | 28.4247365 |
69 | Silent | 58 | 0.000008 | 117.72281 | 0.0000167 | 59.7228112 |
288 | Generation Z | 73 | 0.000006 | 16.98247 | 0.0000015 | 56.0175275 |
367 | Millenials | 267 | 0.000026 | 267.27283 | 0.0000265 | 0.2728293 |
395 | Generation X | 199 | 0.000011 | 264.84000 | 0.0000153 | 65.8399992 |
470 | Silent | 4797 | 0.000264 | 4456.63969 | 0.0002454 | 340.3603141 |
594 | Boomers | 3058 | 0.000071 | 2480.30730 | 0.0000577 | 577.6926953 |
695 | Generation Z | 201 | 0.000009 | 335.87948 | 0.0000159 | 134.8794764 |
Sumaremos los residuos para ver qué modelo logra un mejor ajuste
## [1] 121159.7
## [1] 182434.3
El modelo presenta un error mayor, por lo que nuestro mejor modelo es el modelo Poisson con todas las variables explicativas.