# Chapter 6 Sampling

- Parameters versus statistics
- Sampling Distribution
- Example: Exponential Distribution

## 6.1 Parameters versus Statistics

Statistics is about inferring (learning) something about a population *parameter*, by using *statistics* constructing from a sample of the population.

For example, the population mean is often denote by \(\mu\). If we could observe the entire population, then we could calculate \(\mu\) by adding up all the values and dividing by the number of observations.

We do not do this because it is often too expensive (money and/or time) to collect information on everyone in the population.

Instead, we randomly sample a smaller group from the population and calculate the mean for this group. \(\bar{x}\).

The population mean, \(\mu\), is the *parameter* of interest.
The sample mean, \(\bar{x}\), is the sample statistic used to infer \(\mu\).

## 6.2 Parameters are the Truth

The population parameter is the truth. The population parameter itself has no variance. Sounds weird?

**Logical Concept**: If the population was made up of the following 5 numbers (1,2,3,4, and 5), then we could easily find that the population mean is 3. In fact, every single time you use **ALL** of these numbers to calculate the mean you get the same answer, 3.

The sample statistic is a guess (hopefully a good one) of the population parameter. Suppose you could only select 2 number from our population. How many different means would you find?

### 6.2.1 Sample Statistics are Best Guesses

Our samples of two and the means

- 1,2 (Mean = 1.5)
- 1,3 (Mean = 2)
- 1,4 (Mean = 2.5)
- 1,5 (Mean = 3)
- 2,3 (Mean = 2.5)
- 2,4 (Mean = 3)
- 2,5 (Mean = 3.5)
- 3,4 (Mean = 3.5)
- 3,5 (Mean = 4)
- 4,5 (Mean = 4.5)

## 6.3 Central Limit Theorem

One of the beautiful things about sample means is that

- It approximates the population mean well
- It improves in accuracy as the sample size increases
- The distribution of sample means is approximately normal (symmetric) even if the population distribution is not.

This last bullet point is crucial. In reality, we do not know the true distribution of the data. But the central limit theorem protects us from our own ignorance. We can still say things about the data when we are interested in the mean. Even if X is distributed something very crazy, the distribution of the mean of X will be approximately normal.

### 6.3.1 Example: Exponential Distribution

We will pull random samples from an exponential distribution, which is skewed right (i.e. not symmetric) and show that the sample means are normally distributed.

### 6.3.2 Random samples of size 10 with 500 draws

```
exp_means=c(1:500)
for (i in 1 : nsim){ exp_means[i] = mean(sample(x,10)) }
sample_mean<-mean(exp_means)
hist(exp_means,main="Histogram of 500 draws from exponential distribution",breaks=25,xlim = c(0,10))
abline(v = miu, col= 2, lwd = 2,lty=2)
abline(v = sample_mean, col= 3, lwd = 1)
legend('topright', c("Expected Mean", "Sample Mean"),
lty= c(2,1),
bty = "n", col = c(col = 2, col = 3))
```

## 6.4 Sampling Distribution Properties

The expected value of the sample mean is the population mean.
**NOTE**:E[a+b]=E[a]+E[b]
\[E[\bar{x}]=E[\frac{1}{n}\sum{x_{i}}]=\frac{1}{n}\sum E[x_i]=\frac{1}{n}\sum \mu = \frac{1}{n}n\mu=\mu\]

The variance of the sample mean is depedent on the sample size and the poulation variance
**NOTE**: if “a” is a constant and x has a variance of \(\sigma^2\), then \(VAR[a*x]=a^2 VAR[x]=a^2 \sigma^2\)

\[VAR[\bar{x}]=VAR[\frac{1}{n}\sum (x_i)]=\frac{1}{n^2}\sum VAR[x_i]=\frac{1}{n^2}*n*\sigma^2=\frac{\sigma^2}{n}\]

The **standard error (SE)** is the standard deviation of the sample mean.
\[SE = \frac{\sigma}{\sqrt{n}}\]
The **standard error** decreases if the sample size increases or the population standard deviation decreases.