2.7 PRIMJER 6
- U programu R Studio procijenite logističku regresiju oblika:
\[\begin{equation} log\bigg(\dfrac{p_i}{1-p_i}\bigg)=\beta_0 + \beta_1 x_i + \beta_2 z_i, \end{equation}\]
pri čemu su:
\[\begin{align} p_i&=\text{vjerojatnost da će poduzeće kotirati na burzi}\\ x_i&=\text{prihod poduzeća u 000 kn} \\ z_i&=\text{broj zaposlenih u poduzeću} \end{align}\]
- Uzmite u obzir da je zavisna varijabla \(y=\{0,~1\}\) već kreirana unutar objekta
mojipodaci
(varijablad3
). Novu logističku regresiju nazovitelogisticka2
. Za procjenu modela logističke regresije koristite naredbuglm()
uz pretpostavku Binomne distribucije i “logit” vezne funkcije
=glm(d3~prihod+zaposleni,data=mojipodaci,family=binomial(link="logit"))
logisticka2summary(logisticka2)
##
## Call:
## glm(formula = d3 ~ prihod + zaposleni, family = binomial(link = "logit"),
## data = mojipodaci)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -3.882e-04 -2.000e-08 -2.000e-08 2.000e-08 4.082e-04
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4653.4 490610.4 -0.009 0.992
## prihod 116.8 10100.2 0.012 0.991
## zaposleni -215.8 21126.7 -0.010 0.992
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 6.8029e+01 on 49 degrees of freedom
## Residual deviance: 3.6457e-07 on 47 degrees of freedom
## AIC: 6
##
## Number of Fisher Scoring iterations: 25
- Protumačite značenje dobivenih koeficijenata konkretno.
exp(coefficients(logisticka2))
## (Intercept) prihod zaposleni
## 0.000000e+00 5.273650e+50 1.956816e-94
- Dobivene vjerojatnosti spremite kao novu varijablu
p2
unutar spremnika podatakamojipodaci
$p2=predict(logisticka2,type="response")
mojipodacihead(mojipodaci)
## prihod zaduzenost djelatnost kotacija zaposleni reklama rizik d1 d2 d3
## 1 60.53 0.88859 trgovina ne 16 58.80 5 0 1 0
## 2 50.33 0.06934 proizvodnja ne 13 37.27 1 1 0 0
## 3 130.61 0.21144 usluge da 41 40.00 2 0 0 1
## 4 100.67 0.55482 trgovina ne 33 42.98 3 0 1 0
## 5 130.25 0.14767 trgovina da 41 72.00 1 0 1 1
## 6 130.95 0.14211 trgovina da 41 63.07 1 0 1 1
## p p2
## 1 0.0000 2.220446e-16
## 2 0.0000 2.220446e-16
## 3 1.0000 1.000000e+00
## 4 0.3665 7.536640e-08
## 5 1.0000 1.000000e+00
## 6 1.0000 1.000000e+00
confusionMatrix(mojipodaci$d3,mojipodaci$p2,threshold=0.5)
1-misClassError(mojipodaci$d3,mojipodaci$p2,threshold=0.5)
## 0 1
## 0 29 0
## 1 0 21
## [1] 1
Dodavanjem varijable
zaposleni
sva poduzeća su točno klasificirana pri razini praga od \(0.5\)Koja bi bila optimalna razina praga?
optimalCutoff(mojipodaci$d3,mojipodaci$p2,returnDiagnostics=TRUE)
## $optimalCutoff
## [1] 0.01
##
## $sensitivityTable
## CUTOFF FPR TPR YOUDENSINDEX SPECIFICITY MISCLASSERROR
## 1 1.000000e+00 0 0.9047619 0.9047619 1 0.04
## 2 9.900000e-01 0 1 1.0000000 1 0.00
## 3 9.800000e-01 0 1 1.0000000 1 0.00
## 4 9.700000e-01 0 1 1.0000000 1 0.00
## 5 9.600000e-01 0 1 1.0000000 1 0.00
## 6 9.500000e-01 0 1 1.0000000 1 0.00
## 7 9.400000e-01 0 1 1.0000000 1 0.00
## 8 9.300000e-01 0 1 1.0000000 1 0.00
## 9 9.200000e-01 0 1 1.0000000 1 0.00
## 10 9.100000e-01 0 1 1.0000000 1 0.00
## 11 9.000000e-01 0 1 1.0000000 1 0.00
## 12 8.900000e-01 0 1 1.0000000 1 0.00
## 13 8.800000e-01 0 1 1.0000000 1 0.00
## 14 8.700000e-01 0 1 1.0000000 1 0.00
## 15 8.600000e-01 0 1 1.0000000 1 0.00
## 16 8.500000e-01 0 1 1.0000000 1 0.00
## 17 8.400000e-01 0 1 1.0000000 1 0.00
## 18 8.300000e-01 0 1 1.0000000 1 0.00
## 19 8.200000e-01 0 1 1.0000000 1 0.00
## 20 8.100000e-01 0 1 1.0000000 1 0.00
## 21 8.000000e-01 0 1 1.0000000 1 0.00
## 22 7.900000e-01 0 1 1.0000000 1 0.00
## 23 7.800000e-01 0 1 1.0000000 1 0.00
## 24 7.700000e-01 0 1 1.0000000 1 0.00
## 25 7.600000e-01 0 1 1.0000000 1 0.00
## 26 7.500000e-01 0 1 1.0000000 1 0.00
## 27 7.400000e-01 0 1 1.0000000 1 0.00
## 28 7.300000e-01 0 1 1.0000000 1 0.00
## 29 7.200000e-01 0 1 1.0000000 1 0.00
## 30 7.100000e-01 0 1 1.0000000 1 0.00
## 31 7.000000e-01 0 1 1.0000000 1 0.00
## 32 6.900000e-01 0 1 1.0000000 1 0.00
## 33 6.800000e-01 0 1 1.0000000 1 0.00
## 34 6.700000e-01 0 1 1.0000000 1 0.00
## 35 6.600000e-01 0 1 1.0000000 1 0.00
## 36 6.500000e-01 0 1 1.0000000 1 0.00
## 37 6.400000e-01 0 1 1.0000000 1 0.00
## 38 6.300000e-01 0 1 1.0000000 1 0.00
## 39 6.200000e-01 0 1 1.0000000 1 0.00
## 40 6.100000e-01 0 1 1.0000000 1 0.00
## 41 6.000000e-01 0 1 1.0000000 1 0.00
## 42 5.900000e-01 0 1 1.0000000 1 0.00
## 43 5.800000e-01 0 1 1.0000000 1 0.00
## 44 5.700000e-01 0 1 1.0000000 1 0.00
## 45 5.600000e-01 0 1 1.0000000 1 0.00
## 46 5.500000e-01 0 1 1.0000000 1 0.00
## 47 5.400000e-01 0 1 1.0000000 1 0.00
## 48 5.300000e-01 0 1 1.0000000 1 0.00
## 49 5.200000e-01 0 1 1.0000000 1 0.00
## 50 5.100000e-01 0 1 1.0000000 1 0.00
## 51 5.000000e-01 0 1 1.0000000 1 0.00
## 52 4.900000e-01 0 1 1.0000000 1 0.00
## 53 4.800000e-01 0 1 1.0000000 1 0.00
## 54 4.700000e-01 0 1 1.0000000 1 0.00
## 55 4.600000e-01 0 1 1.0000000 1 0.00
## 56 4.500000e-01 0 1 1.0000000 1 0.00
## 57 4.400000e-01 0 1 1.0000000 1 0.00
## 58 4.300000e-01 0 1 1.0000000 1 0.00
## 59 4.200000e-01 0 1 1.0000000 1 0.00
## 60 4.100000e-01 0 1 1.0000000 1 0.00
## 61 4.000000e-01 0 1 1.0000000 1 0.00
## 62 3.900000e-01 0 1 1.0000000 1 0.00
## 63 3.800000e-01 0 1 1.0000000 1 0.00
## 64 3.700000e-01 0 1 1.0000000 1 0.00
## 65 3.600000e-01 0 1 1.0000000 1 0.00
## 66 3.500000e-01 0 1 1.0000000 1 0.00
## 67 3.400000e-01 0 1 1.0000000 1 0.00
## 68 3.300000e-01 0 1 1.0000000 1 0.00
## 69 3.200000e-01 0 1 1.0000000 1 0.00
## 70 3.100000e-01 0 1 1.0000000 1 0.00
## 71 3.000000e-01 0 1 1.0000000 1 0.00
## 72 2.900000e-01 0 1 1.0000000 1 0.00
## 73 2.800000e-01 0 1 1.0000000 1 0.00
## 74 2.700000e-01 0 1 1.0000000 1 0.00
## 75 2.600000e-01 0 1 1.0000000 1 0.00
## 76 2.500000e-01 0 1 1.0000000 1 0.00
## 77 2.400000e-01 0 1 1.0000000 1 0.00
## 78 2.300000e-01 0 1 1.0000000 1 0.00
## 79 2.200000e-01 0 1 1.0000000 1 0.00
## 80 2.100000e-01 0 1 1.0000000 1 0.00
## 81 2.000000e-01 0 1 1.0000000 1 0.00
## 82 1.900000e-01 0 1 1.0000000 1 0.00
## 83 1.800000e-01 0 1 1.0000000 1 0.00
## 84 1.700000e-01 0 1 1.0000000 1 0.00
## 85 1.600000e-01 0 1 1.0000000 1 0.00
## 86 1.500000e-01 0 1 1.0000000 1 0.00
## 87 1.400000e-01 0 1 1.0000000 1 0.00
## 88 1.300000e-01 0 1 1.0000000 1 0.00
## 89 1.200000e-01 0 1 1.0000000 1 0.00
## 90 1.100000e-01 0 1 1.0000000 1 0.00
## 91 1.000000e-01 0 1 1.0000000 1 0.00
## 92 9.000000e-02 0 1 1.0000000 1 0.00
## 93 8.000000e-02 0 1 1.0000000 1 0.00
## 94 7.000000e-02 0 1 1.0000000 1 0.00
## 95 6.000000e-02 0 1 1.0000000 1 0.00
## 96 5.000000e-02 0 1 1.0000000 1 0.00
## 97 4.000000e-02 0 1 1.0000000 1 0.00
## 98 3.000000e-02 0 1 1.0000000 1 0.00
## 99 2.000000e-02 0 1 1.0000000 1 0.00
## 100 1.000000e-02 0 1 1.0000000 1 0.00
## 101 2.220446e-16 1 1 0.0000000 0 0.58
##
## $misclassificationError
## [1] 0
##
## $TPR
## [1] 1
##
## $FPR
## [1] 0
##
## $Specificity
## [1] 1
- Prikažite grafički ROC krivulju
plotROC(mojipodaci$d3,mojipodaci$p2)
- Koji je model logističke regresije prikladniji? Zaključak donesite na temelju testa omjera vjerodostojnoti LRT (Likelihood Ratio Test)
anova(logisticka,logisticka2,test="LRT")
## Analysis of Deviance Table
##
## Model 1: d3 ~ prihod
## Model 2: d3 ~ prihod + zaposleni
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 48 5.4827
## 2 47 0.0000 1 5.4827 0.01921 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1