Chapter 4 M4: Inference Basics
In this module, we’re building up the basic principles of inference: making guesses or estimates about a population based on what we observe in a sample. First we explore the idea that different samples will give different observed results – the question is, can we describe how those observed results vary? Then we get more technical by introducing the Central Limit Theorem and using it to create confidence intervals and perform hypothesis tests. Some key skills are:
- Write the notation for a normal distribution, and identify what each part represents
- Sketch a normal distribution and label its mean and standard deviation
- Calculate a z-score and interpret it in context
- Interpret word problems about tail (or other) probabilities with a normal distribution
- Although it’s important to learn how to use software for this, I won’t ask you to generate code on an Assessment. I might ask you to read code or interpret R’s output. I might also give you some examples of code and ask you to identify the parts or tell me if something’s wrong with it.
- You should be able to draw a picture to represent the problem you’re trying to solve, as well as write it in formal probability notation.
- Write the 68-95-99.7 rule, and use it to get approximate answers to normal probability questions
- Explain the difference between a population parameter and a point estimate (sometimes called a sample statistic)
- Explain the concept of sampling variability and a sampling distribution
- We’re mostly focusing on the case where our population parameter of interest is a proportion, at least in terms of mathematical details. But you should be aware that parameters and statistics can be other types of things, like a mean (we’ll get even more options later!).
- You should also know when to use the term standard error.
- Write the Central Limit Theorem (for proportions) and explain what it tells you about results from different samples
- I absolutely do not expect you to prove the CLT or explain why it works. That is a job for a different course :)
- You should be able to check the conditions for the CLT (independence and success/failure).
- The most important part is to relate the CLT to the behavior of different kinds of samples. What happens to the sampling distribution of \(\hat{p}\)’s from different samples if the true population \(p\) is larger or smaller? What happens if the size of each sample, \(n\), is larger or smaller? The pictures in OIS section 5.1.5 are helpful here :)
- Use terms like population parameter, point estimate, and sample statistic, and identify them in context
- State and check the conditions for doing inference for one proportion (hint: think about the conditions for the CLT!)
- Write the formula for a confidence interval (for one proportion) and explain what the pieces mean
- Match this formula up with information given to you in a problem, to actually create a confidence interval
- Interpret a confidence interval in context
- Discuss what happens to confidence intervals if \(p\) and/or \(n\) changes
- Use terms like null/alternative hypothesis, statistical/practical significance, etc. and relate them to context
- Identify the null hypothesis based on a problem’s context
- For example, you might be asked to write out the null and alternative hypothesis in formal notation, based on a description in a word problem
- Give the formal definition of a p-value, and interpret it in context
- Describe the relationship between confidence intervals and hypothesis tests (i.e., how you can use a confidence interval as a shortcut to perform a hypothesis test)
- Define a confidence level
- Explain how it relates to/changes confidence intervals and hypothesis test results
- Explain how it relates to context (when might you need or not need higher confidence?)
- Define Type I and Type II decision errors and identify them in the context of a problem
Doing calculation for this stuff is important, but it’s also primarily a job for computers. On an Assessment, I wouldn’t expect you to come up with final numerical results (unless the problem is doable by hand, like if you can use the 68-95-99.7 rule!). You might be asked to set up your calculation in formal math/probability notation or draw a picture. Again, I wouldn’t ask you to write code on an Assessment, though I might ask you to interpret code, interpret results, or choose between different options of code that I give you.