## 4.1 Intro to the *t* distribution

### 4.1.1 What’s the goal here?

Getting really deep into the mathematics of the \(t\) distribution is a whole adventure. But as an introduction, I do want to say a little bit about *why* we sometimes have to use it, and more or less what it looks like.

### 4.1.2 Why not Normal?

If you’ve done any sort of sampling or inference work, you may have previously noticed a bit of a problem. We keep talking about this \(\sigma\), the population standard deviation. It shows up in the sampling distribution of the sample mean \(\overline{y}\), for example:
\[\overline{y}\sim N(\mu, \frac{\sigma}{\sqrt{n}})\]
…thanks to the CLT. If we want to talk about how much different samples’ \(\overline{y}\)’s *vary* – their *spread* – then we need this standard deviation \(\frac{\sigma}{\sqrt{n}}\), and we just don’t know what it is.

One way of dealing with this is to *pretend* we know what \(\sigma\) is, by estimating it with our sample standard deviation, \(s\). I mean, that is kind of our best guess, so it’s not a bad idea.

The problem is this: \(s\) *isn’t* equal to \(\sigma\). We only have a sample, so our estimate is going to be a bit wrong. Interestingly, we know something about *how* it tends to be wrong: it tends to be too small.

This means that if we say \[\overline{y}\sim N(\mu, \frac{s}{\sqrt{n}})\quad \mbox{<- this is wrong!!}\]

we’ll be underestimating the spread a bit. We’re actually more likely to get extreme \(\overline{y}\) values, far from \(\mu\), than this would indicate.

If we standardize \(\overline{y}\) by subtracting the mean and dividing by \(\frac{s}{\sqrt{n}}\), we *don’t* get a standard Normal distribution. We get something else – something that’s more likely to have extreme positive or negative values.

### 4.1.3 What does the \(t\) do?

Yes, Guinness as in the beer! There’s a long and storied history of statisticians working in industry to help with quality assurance, forecasting, and more.

Gosset was working before computers were a thing, which made his job a lot more tedious than yours :)

It turns out, we actually *know* what this other distribution is – this thing that’s “sort of like a Normal but with more chance of extreme values.” This was worked out, by hand, by a dude named William Gosset; he worked for Guinness in the early days of industrial statistical consulting. It’s a great story for another time.

Gosset called this “Normal with more weird stuff” the \(t\) distribution. He published under the pseudonym “Student,” so it’s often called the *Student’s* \(t\) distribution.

It looks a bit like this:

```
data.frame(x = c(-5, 5)) %>%
ggplot(aes(x)) +
stat_function(fun = dt, n=202, args = list(df = 3),
color = "red") +
stat_function(fun = dnorm, n=202, args = list(mean = 0, sd = 1),
color = "black")
```

Here the \(t\) distribution is drawn in red (or gray depending on your settings), and the standard Normal is drawn in black for comparison. Notice how the \(t\) distribution has what’s called *heavier tails* – there’s more probability of getting values far from 0.

There’s actually a whole family of \(t\) distributions, defined by their *degrees of freedom* (or “df”). Degrees of freedom is a story for another time, but what you should know right now is that the more degrees of freedom a \(t\) has, the “better behaved” it is – fewer extreme values. A \(t\) distribution with infinity degrees of freedom is just a Normal! Actually, infinity starts around 30 – a \(t_{30}\) is effectively the same as a Normal for most purposes.

Here are the \(t\) distributions with 2, 5, and 20 df, plus the Normal:

```
data.frame(x = c(-5, 5)) %>%
ggplot(aes(x)) +
stat_function(fun = dt, n=202, args = list(df = 2),
color = "orange") +
stat_function(fun = dt, n=202, args = list(df = 5),
color = "magenta") +
stat_function(fun = dt, n=202, args = list(df = 20),
color = "blue") +
stat_function(fun = dnorm, n=202, args = list(mean = 0, sd = 1),
color = "black") +
annotate("text", x = -3.3, y = .3, label = "t_2, 5, 20, & infinity")
```

We’ll see the \(t\) distribution pop up a lot – particularly when we are using a sample standard deviation in place of a population standard deviation. Do not be alarmed: you can get probabilities and quantiles and whatnot out of it just like you can with a Normal. It’s just a way of mathematically compensating for the fact that \(s\) is not a perfect estimate of \(\sigma\).