16 Day 15

Announcements

  • Don’t come to class sick

    • Don’t come to class sick

      • Don’t come to class sick
  • If you miss class, review the notes

    • If you have questions on the notes please email me

      • Whatever confused you definitely confused someone else, you’re helping me as an instructor more than yourself. There are no stupid questions.
  • Today is a day about conceptual knowledge

    • Just try to focus on understanding the theory rather than the math

Review

A standard normal distribution is a normal distribution with:

  • μ=0μ=0

  • σ=1σ=1


We use the letter Z to represent a standard normal random variable (referring to z-score)

The probability that a standard normal random variable Z is between a and b (P(a<Z<b)) is equal to the area under the standard normal curve over the interval [a,b]


To make inference about the standard Normal distribution, for this class we use the z-table

  • Reading it isn’t difficult

  • You can put a reminder on your cheat sheet if you so choose:

Left Column=0.0X, i.e.1.2X

Top Column=X.X0, i.e.X.X6

L+T=1.26


The difficulty is getting to the point where we read it

  • Conceptually you need to grasp a couple things:

0.84130.1586=0.6827

10.6827=0.3173


A tool for your toolbox

10.1586=0.84140.84140.1586=0.6828

What have we done?

Non-standard Normal Distributions

If X is a normal random variable with mean μ and standard deviation σ we write XN(μ,σ2)

  • means “is distributed (as)”

  • N() refers to the normal distribution

    • Together we’re saying “X is distributed normal with mean μ and variance σ2
  • So if μ=100 and σ=5 we’ll write X(100,52) or X(100,25)

  • If we write X(16,5) then μ=16 and σ=5


We’ve seen that the standard normal distribution is well understood and we can find probabilities, percentiles, etc. “easily” using a z-table

If we want to easily learn these things about non-standard random variables it would be convenient if we could transform them into standard normal random variables

  • Fortunately we can

if XN(μ,σ2), then Z=XμσN(0,1)

XN(100,10)Z=X10010N(0,1)


z=xμσ

What we’ve done is convert the non-standard normal distribution into a set of z-scores

  • What do these z-scores measure?


The process we’ve performed in making this conversion is called standardizing a normal random variable

  • We can do this moving forward to find out information from a non-standard normal r.v. using the z-tables

P(Xx)=P(Xμσxμσ)=P(Zz)

Non-standard Normal Examples

Suppose that the heights of American men (20 years and older) are approximately normal with a mean of 70 inches and a standard deviation of 4 inches.

  1. What proportion of American men are less than 6 feet tall?
  • (6= 72”)

  • XN(70,42)

P(X72)=P(Z72704)=P(Z0.5)

  1. What proportion of American men are between 5’ and 6’ 8”tall?
  • (5= 60” and 68= 80”)

P(60<X<80)=P(60704<Z<80704)

We know that for any z-score, the area to the left of the negative is exactly equal to the area to the right of the positive:


So:

=P(2.50<Z<2.50)=2×P(Z<2.50)

=12(                     )=



Given our height example (XN(70,42)), how tall would you have to be so that you are taller than 90% of American men?

  • Refer to your z-table

    • Find the closest value to 0.90

    • “un-standardize” the value:

x=σz+μ=4(             )+70=


What is an interval of two heights that contains approximately 50% of American men?

  • Find a value on your z-table that is approximately 0.25 or 0.75

    • Why these values?
  • We should end up with 0.67 at 0.75

    • at 0.25 it should be 0.67

x=σ(z)+μ=4(0.67)+70=67.32

x=σz+μ=4(0.67)+70=72.68



Sampling Distribution of Sample Mean & Central Limit

Let’s remember some core vocabulary:

  • Population: The entire collection of individuals we’re seeking information from

  • Sample: A subset of a population of which we can gather real observations from

  • Parameter: A value derived from a population

  • Statistic: A value derived from a sample


Realistically we will never quantify a parameter directly from a population

  • The major goal of the statistical sciences is to make inference about a population and its parameters by gathering a sample and deriving statistics

In practice:

  • Start with a research question

    • “How effective are seasonal Influenza vaccine campaigns in Kansas?”

PopulationParameterSampleStatisticKansas ResidentspV10 Kansas TownsˆpV



Business Week reported on the cost per treatment of Herceptin, a drug used to treat breast cancer. Typical treatment costs (in dollars) for Herceptin are provided by a simple random sample of 5 patients.

43765578271749204495

Find a number that can be used as an estimate of the mean cost per treatment with Herceptin.


  • Suppose we are interested in determining the average time (in minutes) it takes K-State students to travel to their hometowns.

  • We take a simple random sample of 100 K-State students, ask each selected student how long it takes to travel home, and then compute the sample mean:

ˉx=91.34

  • Suppose we take another sample of 100 K-State students. This time our sample mean is:

ˉx=89.63

  • If we view taking a random sample as an experiment, then the sample mean ˉx is a numerical value assigned to each outcome of the experiment.


We’ve discussed this previously, ˉx, our sample mean, is a random variable

  • When our value is arising from a sample, a limited subset of the population, it’s value with vary each time our sample changes

  • So all statistics derived from a sample are random variables


This is a fundamental concept to grasp for all of statistics:

  • All random variables have a random probability distribution

    • As all statistics are random variables:

      • All statistics arise from a random probability distribution


We refer to the probability distribution of a sample statistic as the sampling distribution

  • We’re going to look at this through the lens of the sample mean


Let ˉx be the mean of a random sample of size n, drawn from a population with mean μ and standard deviation σ

Since ˉx is a random variable, it has the mean and the standard deviation

  • The mean of ˉx is μ. That is,

μˉx=μ=population mean

  • The standard deviation of ˉx is σ/n. That is,

σˉx=σn=population std. deviationsample size


a. A population has mean μ=6 and standard deviation σ=4. Find μˉx and σˉx for a sample size of n=25



b. A population has mean μ=17 and standard deviation σ=20. Find μˉx and σˉx for a sample size of n=100



The mean and standard deviation of the sample mean ˉx are

μˉx=μ

σˉx=σn

  • This is true even when the true values of μ and σ are unknown

  • This is how we make inference about population parameters with only sample statistics

  • We know the values of two parameters associated with the sampling distribution of ˉx

    • To fully understand its distribution, we also need to know its shape

    • Accessing all of this information is typically done through something called an exploratory analysis


  • Don’t come to class sick

    • Go away