16 Day 15

Announcements

  • Don’t come to class sick

    • Don’t come to class sick

      • Don’t come to class sick
  • If you miss class, review the notes

    • If you have questions on the notes please email me

      • Whatever confused you definitely confused someone else, you’re helping me as an instructor more than yourself. There are no stupid questions.
  • Today is a day about conceptual knowledge

    • Just try to focus on understanding the theory rather than the math

Review

A standard normal distribution is a normal distribution with:

  • μ=0μ=0

  • σ=1σ=1


We use the letter ZZ to represent a standard normal random variable (referring to zz-score)

The probability that a standard normal random variable ZZ is between aa and bb (P(a<Z<b)P(a<Z<b)) is equal to the area under the standard normal curve over the interval [a,b][a,b]


To make inference about the standard Normal distribution, for this class we use the zz-table

  • Reading it isn’t difficult

  • You can put a reminder on your cheat sheet if you so choose:

Left Column=0.0X, i.e.1.2XLeft Column=0.0X, i.e.1.2X

Top Column=X.X0, i.e.X.X6Top Column=X.X0, i.e.X.X6

L+T=1.26L+T=1.26


The difficulty is getting to the point where we read it

  • Conceptually you need to grasp a couple things:

0.84130.1586=0.68270.84130.1586=0.6827

10.6827=0.317310.6827=0.3173


A tool for your toolbox

10.1586=0.84140.84140.1586=0.682810.1586=0.84140.84140.1586=0.6828

What have we done?

Non-standard Normal Distributions

If XX is a normal random variable with mean μμ and standard deviation σσ we write XN(μ,σ2)XN(μ,σ2)

  • means “is distributed (as)”

  • N()N() refers to the normal distribution

    • Together we’re saying “X is distributed normal with mean μμ and variance σ2σ2
  • So if μ=100μ=100 and σ=5σ=5 we’ll write X(100,52)X(100,52) or X(100,25)X(100,25)

  • If we write X(16,5)X(16,5) then μ=16μ=16 and σ=5σ=5


We’ve seen that the standard normal distribution is well understood and we can find probabilities, percentiles, etc. “easily” using a zz-table

If we want to easily learn these things about non-standard random variables it would be convenient if we could transform them into standard normal random variables

  • Fortunately we can

if XN(μ,σ2), then Z=XμσN(0,1)if XN(μ,σ2), then Z=XμσN(0,1)

XN(100,10)Z=X10010N(0,1)XN(100,10)Z=X10010N(0,1)


z=xμσz=xμσ

What we’ve done is convert the non-standard normal distribution into a set of zz-scores

  • What do these zz-scores measure?


The process we’ve performed in making this conversion is called standardizing a normal random variable

  • We can do this moving forward to find out information from a non-standard normal r.v. using the zz-tables

P(Xx)=P(Xμσxμσ)=P(Zz)P(Xx)=P(Xμσxμσ)=P(Zz)

Non-standard Normal Examples

Suppose that the heights of American men (2020 years and older) are approximately normal with a mean of 7070 inches and a standard deviation of 44 inches.

  1. What proportion of American men are less than 66 feet tall?
  • (66== 7272”)

  • XN(70,42)XN(70,42)

P(X72)=P(Z72704)=P(Z0.5)P(X72)=P(Z72704)=P(Z0.5)

  1. What proportion of American men are between 5’ and 6’ 8”tall?
  • (55== 6060” and 6688== 8080”)

P(60<X<80)=P(60704<Z<80704)P(60<X<80)=P(60704<Z<80704)

We know that for any zz-score, the area to the left of the negative is exactly equal to the area to the right of the positive:


So:

=P(2.50<Z<2.50)=2×P(Z<2.50)=P(2.50<Z<2.50)=2×P(Z<2.50)

=12(                     )==12(                     )=



Given our height example (XN(70,42)XN(70,42)), how tall would you have to be so that you are taller than 90%90% of American men?

  • Refer to your zz-table

    • Find the closest value to 0.900.90

    • “un-standardize” the value:

x=σz+μ=4(             )+70=x=σz+μ=4(             )+70=


What is an interval of two heights that contains approximately 50%50% of American men?

  • Find a value on your zz-table that is approximately 0.250.25 or 0.750.75

    • Why these values?
  • We should end up with 0.670.67 at 0.750.75

    • at 0.250.25 it should be 0.670.67

x=σ(z)+μ=4(0.67)+70=67.32x=σ(z)+μ=4(0.67)+70=67.32

x=σz+μ=4(0.67)+70=72.68x=σz+μ=4(0.67)+70=72.68



Sampling Distribution of Sample Mean & Central Limit

Let’s remember some core vocabulary:

  • Population: The entire collection of individuals we’re seeking information from

  • Sample: A subset of a population of which we can gather real observations from

  • Parameter: A value derived from a population

  • Statistic: A value derived from a sample


Realistically we will never quantify a parameter directly from a population

  • The major goal of the statistical sciences is to make inference about a population and its parameters by gathering a sample and deriving statistics

In practice:

  • Start with a research question

    • “How effective are seasonal Influenza vaccine campaigns in Kansas?”

PopulationParameterSampleStatisticKansas ResidentspV10 Kansas TownsˆpVPopulationParameterSampleStatisticKansas ResidentspV10 Kansas Towns^pV



Business Week reported on the cost per treatment of Herceptin, a drug used to treat breast cancer. Typical treatment costs (in dollars) for Herceptin are provided by a simple random sample of 5 patients.

4376557827174920449543765578271749204495

Find a number that can be used as an estimate of the mean cost per treatment with Herceptin.


  • Suppose we are interested in determining the average time (in minutes) it takes K-State students to travel to their hometowns.

  • We take a simple random sample of 100 K-State students, ask each selected student how long it takes to travel home, and then compute the sample mean:

ˉx=91.34¯x=91.34

  • Suppose we take another sample of 100 K-State students. This time our sample mean is:

ˉx=89.63¯x=89.63

  • If we view taking a random sample as an experiment, then the sample mean ˉx¯x is a numerical value assigned to each outcome of the experiment.


We’ve discussed this previously, ˉx¯x, our sample mean, is a random variable

  • When our value is arising from a sample, a limited subset of the population, it’s value with vary each time our sample changes

  • So all statistics derived from a sample are random variables


This is a fundamental concept to grasp for all of statistics:

  • All random variables have a random probability distribution

    • As all statistics are random variables:

      • All statistics arise from a random probability distribution


We refer to the probability distribution of a sample statistic as the sampling distribution

  • We’re going to look at this through the lens of the sample mean


Let ˉx¯x be the mean of a random sample of size nn, drawn from a population with mean μμ and standard deviation σσ

Since ˉx¯x is a random variable, it has the mean and the standard deviation

  • The mean of ˉx¯x is μμ. That is,

μˉx=μ=population meanμ¯x=μ=population mean

  • The standard deviation of ˉx¯x is σ/nσ/n. That is,

σˉx=σn=population std. deviationsample sizeσ¯x=σn=population std. deviationsample size


a. A population has mean μ=6μ=6 and standard deviation σ=4σ=4. Find μˉxμ¯x and σˉxσ¯x for a sample size of n=25n=25



b. A population has mean μ=17μ=17 and standard deviation σ=20σ=20. Find μˉxμ¯x and σˉxσ¯x for a sample size of n=100n=100



The mean and standard deviation of the sample mean ˉx¯x are

μˉx=μμ¯x=μ

σˉx=σnσ¯x=σn

  • This is true even when the true values of μμ and σσ are unknown

  • This is how we make inference about population parameters with only sample statistics

  • We know the values of two parameters associated with the sampling distribution of ˉx¯x

    • To fully understand its distribution, we also need to know its shape

    • Accessing all of this information is typically done through something called an exploratory analysis


  • Don’t come to class sick

    • Go away