4.3 Inference framework II: CIs and reporting

4.3.1 Where were we?

So we set up a research question, and formulated some statistical questions about it. Then we got some good data and tried out a hypothesis test, although we didn’t learn a whole lot from it. Let’s head to the next step in the inference framework:

Create confidence interval

4.3.2 Confidence intervals in general

So I did this hypothesis test, and I failed to reject the null hypothesis that the average blood pressure of US adults is 120. That means I can’t really say whether the true average blood pressure is 120 (or more precisely, whether the true average log blood pressure is \(\log(120)\)). What about…125? 114 and a half?

The thing about hypothesis tests is that they only tell you about a single null value. We want more! This is where confidence intervals come in.

A confidence interval gives you a range of plausible values for the population parameter. In fact, it tells you all the null values that you wouldn’t reject.

Okay, there is this extra issue: if you get a confidence interval on a transformed variable (like our log of blood pressure) and then transform it back into the original scale, it will no longer be centered at the “point estimate” of the parameter. There are various ways of dealing with this that we may or may not worry about, but I guess keep it in mind on your data project if you’re getting weird results.

It’s centered at your estimate of the parameter, and then goes out to either side. The more confident you want to be, the farther out you have to go.

4.3.3 CIs and sampling distributions

Here’s one way to think about confidence intervals. Given appropriate conditions, the CLT gives us the sampling distribution of sample means: \[\overline{y}\sim N(\mu, \frac{\sigma}{\sqrt{n}})\]

The fact that this is a Normal distribution is about to pay off. We know how Normal distributions behave! For example, if I draw a value from a Normal distribution, there’s a 68% chance that it’s within 1 standard deviation of the mean, a 95% chance that it’s within two standard deviations of the mean – well, more like 1.96 – and so on.

Well, we have a random sample. And our own personal sample mean can be considered as a random draw from the distribution of all sample means. Which tells us that there’s, say, a 95% chance that it’s within 1.96 standard deviations of the center, or in other words, a 95% chance that it’s within \[1.96*\frac{\sigma}{\sqrt{n}}\] of…\(\mu\), the true population mean.

So, hey! If there’s a 95% chance that my \(\overline{y}\) is within that range of \(\mu\), then if I start at \(\overline{y}\) and go out that far to either side, there’s a 95% chance that I hit \(\mu\)!

That is: \[\overline{y} \pm 1.96 \frac{\sigma}{\sqrt{n}}\] gives me an interval that has a 95% chance of covering the true value of \(\mu\).

Keep in mind here that \(\mu\) isn’t random. It is whatever it is, written on its secret mountain. What’s uncertain here is whether my sample \(\overline{y}\) happened to be within 1.96 SD of it. But, hey, most samples’ \(\overline{y}\)’s are – in fact, 95% of them! So I’m 95% confident that my sample “works” – that my sample \(\overline{y}\) is indeed within 1.96 SD of this true \(\mu\).

That 1.96 is called a critical value. 95% of samples have \(\overline{y}\)’s within this many SDs of \(\mu\). But 99.7% of sample \(\overline{y}\)’s are within 3 SD’s of \(\mu\), so if I went out 3 SDs from my own sample \(\overline{y}\), I’d be 99.7% confident that I’d reach \(\mu\). And of course these are not my only options; I can get a critical value for any confidence level I want.

Now we do have a problem here: we still don’t know what \(\sigma\) is – that’s the population standard deviation. Well, fine. We substitute the sample standard deviation, \(s\). But this requires us to compensate by using the \(t\) distribution instead of the Normal distribution. You can find critical values from a \(t\) distribution just as you can from a Normal distribution; we write them as \[t^*_{df, \alpha/2}\]

When we’re interested in means, the degrees of freedom (df) is \(n-1\).

4.3.4 Back to the inference framework

Okay, so now I have this idea for a confidence interval: I start at my own \(\bar{y}\) and I go out some number of standard deviations to either side, like so: \[\overline{y} \pm t^*_{n-1,\alpha/2}*\frac{s}{\sqrt{n}}\]

More generally, the formula for a confidence interval is: \[estimate \pm CV_{\alpha}*SE(estimate)\] \(CV_{\alpha}\) is that critical value – it depends on the distribution of your estimate, and on your required confidence level.

I could calculate all this stuff manually too, but instead, let’s just let R do its thing:

my_ttest = t.test(x = sam_NHANES$logBPSysAve, alternative = "two.sided",
       mu = log(120), conf.level = 0.99)
my_ttest
## 
##  One Sample t-test
## 
## data:  sam_NHANES$logBPSysAve
## t = -2.4275, df = 249, p-value = 0.01591
## alternative hypothesis: true mean is not equal to 4.787492
## 99 percent confidence interval:
##  4.743569 4.788962
## sample estimates:
## mean of x 
##  4.766266

R, helpfully, both conducts a hypothesis test and provides a confidence interval. And so should you! Which brings us to the last step:

Report and interpret

Look, there are a lot of steps in this process that R can do. But there are some things that it is your job, as the human, to do: formulating an appropriate statistical question given the context, deciding on a confidence level, thinking about independence of observations. And now, interpreting what you found out, and helping other people understand it.

This process requires judgment, and it depends on your audience. Generally, you won’t want to hand over every single detail of your analysis (or at least, you’ll put it in an appendix). But you also don’t want to just holler “rejected \(H_0\), we win!” and leave.

As a general rule, you should always report:

  • Your reject/FTR decision about \(H_0\) (if you were doing a hypothesis test at all)
  • The associated p-value
  • A confidence interval if at all possible

…and then, you should make sure to put it in context. I can tell you “We failed to reject, the p-value was 0.015, and the confidence interval was (4.74, 4.79),” but what does that mean about blood pressure? Whether you’re working directly with a client or publishing something for a wider readership, you always want to translate your results into the language of your audience. That audience might include statisticians, but most often, it also includes people who aren’t.

Response moment: How would you report these results? There’s an extra calculation step you should probably do – what is it?