16 Day 15

Announcements

Don’t come to class sick
- Don’t come to class sick
  - Don’t come to class sick
If you miss class, review the notes
- If you have questions on the notes please email me
  - Whatever confused you definitely confused someone else, you’re helping me as an instructor more than yourself. There are no stupid questions.
Today is a day about conceptual knowledge
- Just try to focus on understanding the theory rather than the math

Review

A standard normal distribution is a normal distribution with:

$\mu=0$
$\sigma = 1$

We use the letter $Z$ to represent a standard normal random variable (referring to $z$ -score)

The probability that a standard normal random variable $Z$ is between $a$ and $b$ ( $P(a<Z<b)$ ) is equal to the area under the standard normal curve over the interval $[a,b]$

To make inference about the standard Normal distribution, for this class we use the $z$ -table

Reading it isn’t difficult
You can put a reminder on your cheat sheet if you so choose:

$\text{Left Column}=0.0X, \ i.e. 1.2X$

$\text{Top Column}=X.X0, \ i.e. X.X6$

$L+T=1.26$

The difficulty is getting to the point where we read it

Conceptually you need to grasp a couple things:

$0.8413-0.1586=0.6827$

$1-0.6827=0.3173$

A tool for your toolbox

$1-0.1586=0.8414 \newline 0.8414-0.1586=0.6828$

What have we done?

Non-standard Normal Distributions

If $X$ is a normal random variable with mean $\mu$ and standard deviation $\sigma$ we write $X \sim N(\mu,\sigma^2)$

$\sim$ means “is distributed (as)”
$N()$ refers to the normal distribution
- Together we’re saying “X is distributed normal with mean $\mu$ and variance $\sigma^2$
So if $\mu=100$ and $\sigma=5$ we’ll write $X\sim (100,5^2)$ or $X \sim (100,25)$
If we write $X \sim (16,5)$ then $\mu=16$ and $\sigma=\sqrt{5}$

We’ve seen that the standard normal distribution is well understood and we can find probabilities, percentiles, etc. “easily” using a $z$ -table

If we want to easily learn these things about non-standard random variables it would be convenient if we could transform them into standard normal random variables

Fortunately we can

$\text{if} \ X \sim N(\mu,\sigma^2), \ \text{then} \ Z={X-\mu \over \sigma} \sim N(0,1)$

$X \sim N(100,10) \Rightarrow Z={X-100 \over 10} \sim N(0,1)$

$z={x-\mu \over \sigma}$

What we’ve done is convert the non-standard normal distribution into a set of $z$ -scores

What do these $z$ -scores measure?

The process we’ve performed in making this conversion is called standardizing a normal random variable

We can do this moving forward to find out information from a non-standard normal r.v. using the $z$ -tables

$P(X\leq x)=P\left({X-\mu \over \sigma}\leq{x-\mu \over \sigma}\right)=P(Z\leq z)$

Non-standard Normal Examples

Suppose that the heights of American men ( $20$ years and older) are approximately normal with a mean of $70$ inches and a standard deviation of $4$ inches.

What proportion of American men are less than $6$ feet tall?

( $6$ ’ $=$ $72$ ”)
$X \sim N(70,4^2)$

$P(X\leq 72)=P\left(Z\leq{72-70 \over 4}\right)=P(Z\leq 0.5)$

What proportion of American men are between 5’ and 6’ 8”tall?

( $5$ ’ $=$ $60$ ” and $6$ ’ $8$ ” $=$ $80$ ”)

$P(60<X<80)=P\left({60-70 \over 4}<Z<{80-70 \over 4}\right)$

We know that for any $z$ -score, the area to the left of the negative is exactly equal to the area to the right of the positive:

So:

$=P(-2.50<Z<2.50) =2\times P(Z<-2.50)$

$=1-2(\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ )=$

Given our height example ( $X \sim N(70,4^2)$ ), how tall would you have to be so that you are taller than $90\%$ of American men?

Refer to your $z$ -table
- Find the closest value to $0.90$
- “un-standardize” the value:

$x=\sigma z + \mu=4( \ \ \ \ \ \ \ \ \ \ \ \ \ ) + 70 =$

What is an interval of two heights that contains approximately $50\%$ of American men?

Find a value on your $z$ -table that is approximately $0.25$ or $0.75$
- Why these values?
We should end up with $0.67$ at $0.75$
- at $0.25$ it should be $-0.67$

$x=\sigma (-z) + \mu=4(-0.67) + 70 =67.32$

$x=\sigma z + \mu=4(0.67) + 70 =72.68$

Sampling Distribution of Sample Mean & Central Limit

Let’s remember some core vocabulary:

Population: The entire collection of individuals we’re seeking information from
Sample: A subset of a population of which we can gather real observations from
Parameter: A value derived from a population
Statistic: A value derived from a sample

Realistically we will never quantify a parameter directly from a population

The major goal of the statistical sciences is to make inference about a population and its parameters by gathering a sample and deriving statistics

In practice:

Start with a research question
- “How effective are seasonal Influenza vaccine campaigns in Kansas?”

$\begin{array}{|c|c|c|c|} \hline Population & Parameter & Sample & Statistic \\ \hline \text{Kansas Residents} & p_V & 10 \ \text{Kansas Towns} & \hat{p}_V\\ \hline \\ \hline \\ \hline \\ \hline \end{array}$

Business Week reported on the cost per treatment of Herceptin, a drug used to treat breast cancer. Typical treatment costs (in dollars) for Herceptin are provided by a simple random sample of 5 patients.

$\begin{array}{|c|c|c|c|c|} \hline 4376 & 5578 & 2717 & 4920 & 4495 \\ \hline \end{array}$

Find a number that can be used as an estimate of the mean cost per treatment with Herceptin.

Suppose we are interested in determining the average time (in minutes) it takes K-State students to travel to their hometowns.
We take a simple random sample of 100 K-State students, ask each selected student how long it takes to travel home, and then compute the sample mean:

$\bar{x} = 91.34$

Suppose we take another sample of 100 K-State students. This time our sample mean is:

$\bar{x} = 89.63$

If we view taking a random sample as an experiment, then the sample mean $\bar{x}$ is a numerical value assigned to each outcome of the experiment.

We’ve discussed this previously, $\bar{x}$ , our sample mean, is a random variable

When our value is arising from a sample, a limited subset of the population, it’s value with vary each time our sample changes
So all statistics derived from a sample are random variables

This is a fundamental concept to grasp for all of statistics:

All random variables have a random probability distribution
- As all statistics are random variables:
  - All statistics arise from a random probability distribution

We refer to the probability distribution of a sample statistic as the sampling distribution

We’re going to look at this through the lens of the sample mean

Let $\bar{x}$ be the mean of a random sample of size $n$ , drawn from a population with mean $\mu$ and standard deviation $\sigma$

Since $\bar{x}$ is a random variable, it has the mean and the standard deviation

The mean of $\bar{x}$ is $\mu$ . That is,

$\mu_{\bar{x}} = \mu = \text{population mean}$

The standard deviation of $\bar{x}$ is $\sigma / \sqrt{n}$ . That is,

$\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} = \frac{\text{population std. deviation}}{\sqrt{\text{sample size}}}$

a. A population has mean $\mu = 6$ and standard deviation $\sigma = 4$ . Find $\mu_{\bar{x}}$ and $\sigma_{\bar{x}}$ for a sample size of $n = 25$

b. A population has mean $\mu = 17$ and standard deviation $\sigma = 20$ . Find $\mu_{\bar{x}}$ and $\sigma_{\bar{x}}$ for a sample size of $n = 100$

The mean and standard deviation of the sample mean $\bar{x}$ are

$\mu_{\bar{x}} = \mu$

$\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}}$

This is true even when the true values of $\mu$ and $\sigma$ are unknown
This is how we make inference about population parameters with only sample statistics
We know the values of two parameters associated with the sampling distribution of $\bar{x}$
- To fully understand its distribution, we also need to know its shape
- Accessing all of this information is typically done through something called an exploratory analysis

Don’t come to class sick
- Go away