16 Day 15
Announcements
Don’t come to class sick
Don’t come to class sick
- Don’t come to class sick
If you miss class, review the notes
If you have questions on the notes please email me
- Whatever confused you definitely confused someone else, you’re helping me as an instructor more than yourself. There are no stupid questions.
Today is a day about conceptual knowledge
- Just try to focus on understanding the theory rather than the math
Review
A standard normal distribution is a normal distribution with:
μ=0μ=0
σ=1σ=1
We use the letter ZZ to represent a standard normal random variable (referring to zz-score)
The probability that a standard normal random variable ZZ is between aa and bb (P(a<Z<b)P(a<Z<b)) is equal to the area under the standard normal curve over the interval [a,b][a,b]
To make inference about the standard Normal distribution, for this class we use the zz-table
Reading it isn’t difficult
You can put a reminder on your cheat sheet if you so choose:
Left Column=0.0X, i.e.1.2XLeft Column=0.0X, i.e.1.2X
Top Column=X.X0, i.e.X.X6Top Column=X.X0, i.e.X.X6
L+T=1.26L+T=1.26
The difficulty is getting to the point where we read it
- Conceptually you need to grasp a couple things:
0.8413−0.1586=0.68270.8413−0.1586=0.6827
1−0.6827=0.31731−0.6827=0.3173
A tool for your toolbox
1−0.1586=0.84140.8414−0.1586=0.68281−0.1586=0.84140.8414−0.1586=0.6828
What have we done?
Non-standard Normal Distributions
If XX is a normal random variable with mean μμ and standard deviation σσ we write X∼N(μ,σ2)X∼N(μ,σ2)
∼∼ means “is distributed (as)”
N()N() refers to the normal distribution
- Together we’re saying “X is distributed normal with mean μμ and variance σ2σ2
So if μ=100μ=100 and σ=5σ=5 we’ll write X∼(100,52)X∼(100,52) or X∼(100,25)X∼(100,25)
If we write X∼(16,5)X∼(16,5) then μ=16μ=16 and σ=√5σ=√5
We’ve seen that the standard normal distribution is well understood and we can find probabilities, percentiles, etc. “easily” using a zz-table
If we want to easily learn these things about non-standard random variables it would be convenient if we could transform them into standard normal random variables
- Fortunately we can
if X∼N(μ,σ2), then Z=X−μσ∼N(0,1)if X∼N(μ,σ2), then Z=X−μσ∼N(0,1)
X∼N(100,10)⇒Z=X−10010∼N(0,1)X∼N(100,10)⇒Z=X−10010∼N(0,1)
z=x−μσz=x−μσ
What we’ve done is convert the non-standard normal distribution into a set of zz-scores
- What do these zz-scores measure?
The process we’ve performed in making this conversion is called standardizing a normal random variable
- We can do this moving forward to find out information from a non-standard normal r.v. using the zz-tables
P(X≤x)=P(X−μσ≤x−μσ)=P(Z≤z)P(X≤x)=P(X−μσ≤x−μσ)=P(Z≤z)
Non-standard Normal Examples
Suppose that the heights of American men (2020 years and older) are approximately normal with a mean of 7070 inches and a standard deviation of 44 inches.
- What proportion of American men are less than 66 feet tall?
(66’ == 7272”)
X∼N(70,42)X∼N(70,42)
P(X≤72)=P(Z≤72−704)=P(Z≤0.5)P(X≤72)=P(Z≤72−704)=P(Z≤0.5)
- What proportion of American men are between 5’ and 6’ 8”tall?
- (55’== 6060” and 66’88” == 8080”)
P(60<X<80)=P(60−704<Z<80−704)P(60<X<80)=P(60−704<Z<80−704)
We know that for any zz-score, the area to the left of the negative is exactly equal to the area to the right of the positive:
So:
=P(−2.50<Z<2.50)=2×P(Z<−2.50)=P(−2.50<Z<2.50)=2×P(Z<−2.50)
=1−2( )==1−2( )=
Given our height example (X∼N(70,42)X∼N(70,42)), how tall would you have to be so that you are taller than 90%90% of American men?
Refer to your zz-table
Find the closest value to 0.900.90
“un-standardize” the value:
x=σz+μ=4( )+70=x=σz+μ=4( )+70=
What is an interval of two heights that contains approximately 50%50% of American men?
Find a value on your zz-table that is approximately 0.250.25 or 0.750.75
- Why these values?
We should end up with 0.670.67 at 0.750.75
- at 0.250.25 it should be −0.67−0.67
x=σ(−z)+μ=4(−0.67)+70=67.32x=σ(−z)+μ=4(−0.67)+70=67.32
x=σz+μ=4(0.67)+70=72.68x=σz+μ=4(0.67)+70=72.68
Sampling Distribution of Sample Mean & Central Limit
Let’s remember some core vocabulary:
Population: The entire collection of individuals we’re seeking information from
Sample: A subset of a population of which we can gather real observations from
Parameter: A value derived from a population
Statistic: A value derived from a sample
Realistically we will never quantify a parameter directly from a population
- The major goal of the statistical sciences is to make inference about a population and its parameters by gathering a sample and deriving statistics
In practice:
Start with a research question
- “How effective are seasonal Influenza vaccine campaigns in Kansas?”
PopulationParameterSampleStatisticKansas ResidentspV10 Kansas TownsˆpVPopulationParameterSampleStatisticKansas ResidentspV10 Kansas Towns^pV
Business Week reported on the cost per treatment of Herceptin, a drug used to treat breast cancer. Typical treatment costs (in dollars) for Herceptin are provided by a simple random sample of 5 patients.
4376557827174920449543765578271749204495
Find a number that can be used as an estimate of the mean cost per treatment with Herceptin.
Suppose we are interested in determining the average time (in minutes) it takes K-State students to travel to their hometowns.
We take a simple random sample of 100 K-State students, ask each selected student how long it takes to travel home, and then compute the sample mean:
ˉx=91.34¯x=91.34
- Suppose we take another sample of 100 K-State students. This time our sample mean is:
ˉx=89.63¯x=89.63
- If we view taking a random sample as an experiment, then the sample mean ˉx¯x is a numerical value assigned to each outcome of the experiment.
We’ve discussed this previously, ˉx¯x, our sample mean, is a random variable
When our value is arising from a sample, a limited subset of the population, it’s value with vary each time our sample changes
So all statistics derived from a sample are random variables
This is a fundamental concept to grasp for all of statistics:
All random variables have a random probability distribution
As all statistics are random variables:
- All statistics arise from a random probability distribution
We refer to the probability distribution of a sample statistic as the sampling distribution
- We’re going to look at this through the lens of the sample mean
Let ˉx¯x be the mean of a random sample of size nn, drawn from a population with mean μμ and standard deviation σσ
Since ˉx¯x is a random variable, it has the mean and the standard deviation
- The mean of ˉx¯x is μμ. That is,
μˉx=μ=population meanμ¯x=μ=population mean
- The standard deviation of ˉx¯x is σ/√nσ/√n. That is,
σˉx=σ√n=population std. deviation√sample sizeσ¯x=σ√n=population std. deviation√sample size
a. A population has mean μ=6μ=6 and standard deviation σ=4σ=4. Find μˉxμ¯x and σˉxσ¯x for a sample size of n=25n=25
b. A population has mean μ=17μ=17 and standard deviation σ=20σ=20. Find μˉxμ¯x and σˉxσ¯x for a sample size of n=100n=100
The mean and standard deviation of the sample mean ˉx¯x are
μˉx=μμ¯x=μ
σˉx=σ√nσ¯x=σ√n
This is true even when the true values of μμ and σσ are unknown
This is how we make inference about population parameters with only sample statistics
We know the values of two parameters associated with the sampling distribution of ˉx¯x
To fully understand its distribution, we also need to know its shape
Accessing all of this information is typically done through something called an exploratory analysis
Don’t come to class sick
- Go away