# Capítulo 12 Inferencia

Primeiro passo é criar um banco de dados hipotético

set.seed(1)
n = 500
x1 = rnorm(n, mean = 500, sd = 50)
x2 = rpois(n, lambda = 5)
e = rnorm(n)

beta = c(3.5,5)
y = 10 + beta[1]*x1 + beta[2]*x2 + e
X = cbind(1,x1,x2) 
ols = lm(y~ x1 + x2)
summary(ols)
##
## Call:
## lm(formula = y ~ x1 + x2)
##
## Residuals:
##      Min       1Q   Median       3Q      Max
## -3.15798 -0.71935  0.00887  0.70818  3.11361
##
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.6940067  0.4721931   20.53   <2e-16 ***
## x1          3.5002971  0.0009249 3784.62   <2e-16 ***
## x2          5.0274460  0.0212557  236.52   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.044 on 497 degrees of freedom
## Multiple R-squared:      1,  Adjusted R-squared:      1
## F-statistic: 7.25e+06 on 2 and 497 DF,  p-value: < 2.2e-16
confint(ols,level = 0.95)
##                2.5 %    97.5 %
## (Intercept) 8.766266 10.621748
## x1          3.498480  3.502114
## x2          4.985684  5.069208
B = solve(t(X)%*%X)%*%(t(X)%*%y)
B
##        [,1]
##    9.694007
## x1 3.500297
## x2 5.027446

Calcular a estatística $$T$$: $t_n(\theta) = \frac{\hat \theta - \theta}{s(\hat \theta)}$

Considerar $$\theta = 0$$

É preciso calcular $$s(\hat\theta)$$

#y_hat = X%*%B
e_hat = y - X%*%B
k = ncol(X)
sigma_hat = (1/(n-k)) * t(e_hat)%*%e_hat   # n-k correção de viés
sigma_hat
##          [,1]
## [1,] 1.090167
V = sigma_hat[1,1] * solve(t(X)%*%X)
ep = sqrt(diag(V))
ep
##                      x1          x2
## 0.472193136 0.000924874 0.021255675
t = B / ep
t
##          [,1]
##      20.52975
## x1 3784.62042
## x2  236.52253

Definir c para nível de significância de $$5\%$$

1 - (0.05/2)
## [1] 0.975
c = qt(.975,n-k)
c
## [1] 1.964749

Comparando com o verdadeiro valor de Beta

t = (B[2,1] - beta[1]) / ep[2]
t
##        x1
## 0.3212361

P-valor

Assumindo que $$\beta \sim t_{n-k}$$

2 * (1 - pt(t, df = n-k))
##        x1
## 0.7481665

#### 12.0.0.1 Intervalos de Confiança

$C_n = [\hat \theta - c .s(\hat \theta), \hat\theta + c. s(\hat\theta)]$

Definir c para nível de significância de $$5\%$$

c = qt(.975,n-k)
c
## [1] 1.964749

Intervalo de Confiança de $$\beta_0$$

lower = B[1,1] - c*  ep[1]
upper = B[1,1] + c * ep[1]
print(lower);print(upper)
##
## 8.766266
##
## 10.62175

Intervalo de Confiança de $$\beta_1$$

lower = B[2,1] - c*  ep[2]
upper = B[2,1] + c * ep[2]
print(lower);print(upper)
##      x1
## 3.49848
##       x1
## 3.502114

Intervalo de Confiança de $$\beta_2$$

lower = B[3,1] - c*  ep[3]
upper = B[3,1] + c * ep[3]
print(lower);print(upper)
##       x2
## 4.985684
##       x2
## 5.069208

Definir c para nível de significância de $$1\%$$

c = qt(.995,n-k)
c
## [1] 2.585758

Intervalo de Confiança de $$\beta_0$$

lower = B[1,1] - c*  ep[1]
upper = B[1,1] + c * ep[1]
print(lower);print(upper)
##
## 8.47303
##
## 10.91498

Intervalo de Confiança de $$\beta_1$$

lower = B[2,1] - c*  ep[2]
upper = B[2,1] + c * ep[2]
print(lower);print(upper)
##       x1
## 3.497906
##       x1
## 3.502689

Intervalo de Confiança de $$\beta_2$$

lower = B[3,1] - c*  ep[3]
upper = B[3,1] + c * ep[3]
print(lower);print(upper)
##       x2
## 4.972484
##       x2
## 5.082408