6 Statistics

This note was a TP to render as part of my Statistics program. Unfortunately, for legal reasons, I can only put a small part.

times <- c(75, 265, 225, 402, 35, 105, 411, 346, 159, 229, 
           62, 256, 431, 177, 56, 144, 354, 178, 386, 294)

6.1 Determine a probable probability law for these data.

h1<-hist(times, breaks=quantile(times,seq(0,1,0.1)),
         main="Histogram")

cumfreq1<-cumsum(h1$counts) / length(times)

plot(h1$breaks,c(0, cumfreq1), "l",
     main = "Distribution function",
     xlab = "times",ylab="cumul")
lines(h1$breaks,punif(h1$breaks,min(times),max(times)),col="red")

ks.test(times,"punif")
## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  times
## D = 1, p-value < 2.2e-16
## alternative hypothesis: two-sided

\[f(x) = \left\{ \begin{array}{rl} \frac{1}{\theta} & \mbox{if } 0 \leq x \leq \theta, \\ 0 & \mbox{if not } \end{array} \right.\]

6.2 Estimate the parameter of this law by the method of moments.

The expection of the Uniform law is: \[\mathbb{E}(X) = \frac{\theta}{2}\] \[\Rightarrow \hat{\theta}^{MM} = 2 \bar{X}_n\]

theta_mm <- 2*mean(times); theta_mm
## [1] 459

The estimator by the method of moments is then equal to twice the empirical mean.

6.3 Estimate this parameter by the maximum likelihood method.

The likelihood is given by: \[\mathcal{L}(x_1, x_2,...,x_n, \theta) = \prod_{i=1}^n f(x_i)\]

\[\mathcal{L}(x_1, x_2,...,x_n, \theta) = \frac{1}{\theta^n}\]

The function \[\theta \rightarrow \mathcal{L}(x_1, x_2,...,x_n, \theta)\] is strictly decreasing, so it reaches its maximum when \(\theta\) is minimum. And like: \[\forall i \in \{1,...,r\}: \theta \geq x_i\]

It follows that \(\theta\) is minimum when : \[\theta=max(x_1,...,x_n)\]

So the maximum likelihood estimator of a r-sample of a structure uniformity is: \[\hat{\theta} = max(X_1,...,X_n)\]

To determine the probability density of \(\hat{\theta}\), let’s first calculate its distribution function. \[F_{\hat{\theta}}(x) = P[max(X_1,...X_n) < x]\] \[F_{\hat{\theta}}(x) = \prod_{i=1}^n P[X_i < x]\]

\[F_{\hat{\theta}}(x) = \prod_{i=1}^n P[X_i < x]\]

\[F_{\hat{\theta}}(x) = \left\{ \begin{array}{rl} (\frac{x}{\theta})^n & \mbox{if } 0 \leq x \leq \theta, \\ 0 & \mbox{if } x \leq 0 \end{array} \right.\]

The probability density of \(\hat{\theta}\): \[f_{\hat{\theta}}(x)= \frac{\partial F_{\hat{\theta}}(x)}{\partial x}\]

\[f_{\hat{\theta}}(x) = \left\{ \begin{array}{rl} \frac{x^{n-1}}{\theta ^n} n & \mbox{if } x \in ]0,\theta[, \\ 0 & \mbox{if } ux\notin ]0,\theta[ \end{array} \right.\]

The expection of \(\hat{\theta}\) is: \[\mathbb{E}(\hat{\theta}) = \int_{0}^\theta x f(x) dx\]

\[\mathbb{E}[\hat{\theta}] = n \int_{0}^\theta \frac{x^n}{\theta^n} dx = \frac{n}{n+1}\theta \]

This estimator is only asymptotically unbiased.

n <- length(times)
theta_ll <- function(teta_n){
  res <- ((n + 1) / n) * teta_n
  res
}

theta_ll(100)
## [1] 105
max(times)
## [1] 431
2*mean(times)
## [1] 459
mean(times)
## [1] 229.5

6.4 Compare the quality of these estimators.

The estimator of the method of moments is biased: \[\mathbb{E}(\theta^{MM}) = 2\mathbb{E}(\bar{X}_n) = 2 \theta \quad \neq \theta\]

Like the maximum likelihood estimator: \[\mathbb{E}[\hat{\theta}] = \frac{n}{n+1}\theta \]

We will choose the following estimator which is unbiased. \[T_n = \frac{n+1}{n} \theta_n\] \[\mathbb{E}(T_n) = \mathbb{E}(\frac{n+1}{n} \frac{n}{n+1} \theta) = \theta\]

6.5 When, on average, will the next particle arrive?

mean(times)
## [1] 229.5

6.5.1 What is the likelihood that the next particle will arrive before the next 100 hours?

punif(100,min=0,max=max(times))
## [1] 0.2320186

6.5.2 How many particles can we expect to receive in a year?.

(365*24)/mean(times)
## [1] 38.16993