# 13 Some common classical hypothesis tests

Please keep the following in mind:

If the experiment is improperly designed, the data improperly collected or recorded, and an inappropriate test used, your results will be unreliable or worse.

If you do a literature search on a test, different sources will give you different pre-requisites. For example, some authors require normality of data, some accept approximately normal data (whatever approximately normal means), some specify a minimum sample of 31, some of 50, some of more than 50 but less that 10% of population size etc. Some rules-of-thumb are holdovers from a time before powerful computers, some have shown to work well in practice, some are just urban myths.

It is better to err on the side of caution. Standards do evolve, if your job - or someone’s life - depends on your analysis, it might be a good idea to check up on current best practices before choosing a test.

- In a perfect world, the question comes first, then experiment design, then data collection, data clean-up, then the analysis. In real life, people often collect data “just to have it”, and then think about what to do with it later. Still, it is not right to look at the data first, then formulate a question, then do the analysis. You should come up with your question(s) first or you might miss something (remember the titanic data set - survival factors).

As you work through these examples, check if/that the confidence intervals given support your conclusions as well.

## 13.1 t-test for single mean

Assumptions:

- Data are collected at the interval or ratio level of measurement
- Data are continuous
- Data are a random sample
- Data are independent from each other
- Data are from a normal distribution with unknown mean \(\mu\) and unknown standard deviation \(\sigma\)

Note: Many practitioners will use a t-test if the data are either normally distributed or your sample size is more than thirty or both. This is a commonly cited minimum sample size, but often too small

\(\small H_o\) : \(\mu = \mu_0\)

\(\small H_a\) : \(\mu \ne , <, > \mu_0\)

Test statistic \(ts\): \[ts = \frac{\bar{x} - \mu_0}{\frac{s}{\sqrt{n}}}\] Under the null hypothesis, the test statistic \(ts\) has a t-distribution with n-1 degrees of freedom.

**Example:** Testing the claim that 100 randomly generated values are from a standard normal distribution with mean \(\mu\) = 0.3.

\(\small H_o\) : \(\mu = 0.3\)

\(\small H_a\) : \(\mu \ne 0.3\)

```
test_data <- rnorm(100,0,1)
t.test(test_data, mu=0.3, alternative="two.sided")
#>
#> One Sample t-test
#>
#> data: test_data
#> t = -3.3075, df = 99, p-value = 0.001313
#> alternative hypothesis: true mean is not equal to 0.3
#> 95 percent confidence interval:
#> -0.2372055 0.1656598
#> sample estimates:
#> mean of x
#> -0.03577288
```

Based on the p-value, we reject the assumption that the test_data mean is 0.3 (no surprise there, it was generated form a normal distribution with mean 0)

**Example:** Testing the claim that the 100 randomly generated values stored in a variable `x`

really came from a standard normal distribution with a mean of less than 0.1.

\(\small H_o\) : \(\mu = 0.1\)

\(\small H_a\) : \(\mu > 0.1\)

```
x <- rnorm(100,0,1)
t.test(x, mu=0.1, alternative="less",conf.level=0.99)
#>
#> One Sample t-test
#>
#> data: x
#> t = -0.50079, df = 99, p-value = 0.3088
#> alternative hypothesis: true mean is less than 0.1
#> 99 percent confidence interval:
#> -Inf 0.2928974
#> sample estimates:
#> mean of x
#> 0.04817059
```

Based on the p-value, what is your conclusion?

**Note** You have probably heard about the z-test for means. The problem is that the z-test assumes we know the population standard deviation \(\sigma\). Think about it. We are running the test because we don’t know the mean \(\mu\). How realistic is it really to know the standard deviation \(\sigma\) if we don’t know the mean \(\mu\)? Not very.

**Your turn** Run the code below to generate some data. Test the claims that

the population mean \(\mu\) is 0.8

The population mean \(\mu\) is less than 2

the population mean \(\mu\) is greater than 0.5

In each case, make sure you interpret the p-value.

## 13.2 t-test for two means

Use if you are testing a claim about the means of two different populations

- Data are collected at the interval or ratio level of measurement
- Data are continuous
- Data are independent, i.e. measurements for one observation do not affect measurements for any other observation
- Randomly sampled from two normal populations

A. Assuming equal variances

\(H_o\) difference between the means = 0 (mean(x)= mean(y))

\(H_a\) difference is < 0 (In other words, mean (x) < mean(y)).

Here the test statistic is

\[ ts = \frac{\bar x_1 - \bar x_2 }{s_p \sqrt{\frac{1}{n_1}+\frac{1}{n_2}}}, \] \(\bar x_1\) and \(\bar x_2\) are the sample means of the two samples, \(n_1\) and \(n_2\) are the sample sizes (these can be different), and \(s_p\) is the pooled standard deviation. the pooled standard deviation is computed as \[s_p = \sqrt{\frac{(n_1-1)s_1^2 +(n_2-1)s_2^2}{n_1-n_2-2}}\] \(s_1\) and \(s_2\) are the sample variances of the two samples.

Example: As x and y are from the same distribution, we have equal variance. Please note that in practice, we do not know where the data came from.

```
x <- runif(100,0,1)
y <- runif(49, 0, 1)
t.test(x,y,alternative="less",var.equal=TRUE)
#>
#> Two Sample t-test
#>
#> data: x and y
#> t = 1.1249, df = 147, p-value = 0.8688
#> alternative hypothesis: true difference in means is less than 0
#> 95 percent confidence interval:
#> -Inf 0.1409794
#> sample estimates:
#> mean of x mean of y
#> 0.5546963 0.4976535
```

B. Assuming unequal variances, which is OK here as x and y are from different distributions:

\(H_o\) mean(x)= mean(y)

\(H_a\) mean (x) < mean(y)

The test statistics is \[ ts = \frac{\bar x_1 - \bar x_2}{\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}}\]

```
x <- rnorm(100,0,1)
y <- rnorm(100, 0.5, 2)
t.test(x,y,alternative="two.sided")
#>
#> Welch Two Sample t-test
#>
#> data: x and y
#> t = -1.7975, df = 135.72, p-value = 0.07448
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#> -0.89376176 0.04264211
#> sample estimates:
#> mean of x mean of y
#> 0.009662887 0.435222711
```

## 13.3 Paired t-tests

Use when the data are paired, such as data taken from same location before-after. + The difference between the paired measurements should be normally distributed. + Subjects (pairs) must be independent from the other subjects (pairs) + Each one of the two observations in a pair must be from the same subject

In this scenario, we are basically testing claims about the difference in means. For example:

mean 1 = mean 2 \(\leftrightarrow\) difference in means = 0

mean 1 > mean 2 \(\leftrightarrow\) mean1 - mean 2 >0 etc.

Thus, the test statistic is similar to the one for a one sample t-test:

\[ts = \frac{\bar{x}_{differences} - 0}{\frac{s}{\sqrt{n}} } = \frac{\bar{x}_{differences}}{\frac{s}{\sqrt{n}} }\]

```
before <- rep(c(1:10), 25)
after <- before - rnorm(250, 1.5, 0.3)
t.test(before, after,alternative="greater",paired=TRUE)
#>
#> Paired t-test
#>
#> data: before and after
#> t = 78.296, df = 249, p-value < 2.2e-16
#> alternative hypothesis: true mean difference is greater than 0
#> 95 percent confidence interval:
#> 1.46635 Inf
#> sample estimates:
#> mean difference
#> 1.497936
```

## 13.4 Testing a single proportion

### 13.4.1 Exact test using the binomial distribution

Assume you want to test if someone is psychic. You test them by thinking of one of five pre-determined symbols (say a rabbit, a triangle, 7, a flower, and a car). You record if they can read your mind. Let’s say you repeat this ten times.
This setup can be modeled with a binomial distribution where \(n=10\), \(x\) is the number of correct guesses, and \(p=\frac{1}{5}=0.2\). Is someone with 5 guesses psychic, i.e. is guessing 5 or more out of ten an unusually large number?
We figure out what the probability of 5 or more correct guesses is if the probability of success is 0.2 using the build-in `dbinom`

function:

```
x <- c(5:10)
y <- dbinom(x, 10, 0.2)
print(paste("The probability of guessing 5+ out of 10 with p= 0.5 is ",sum(y)))
#> [1] "The probability of guessing 5+ out of 10 with p= 0.5 is 0.0327934976"
```

The probability of ~3% was too high to reject the claim that the guessing 5 + out of 10 was *not* done by chance.

In R we can also just use ‘binom.test’ to perform the hypothesis test. Here

\(H_o\): the probability of success = p

\(H_a\): the probability of success \(\ne\) p

```
binom.test(5, 10, p=0.2, alternative="two.sided")
#>
#> Exact binomial test
#>
#> data: 5 and 10
#> number of successes = 5, number of trials = 10,
#> p-value = 0.03279
#> alternative hypothesis: true probability of success is not equal to 0.2
#> 95 percent confidence interval:
#> 0.187086 0.812914
#> sample estimates:
#> probability of success
#> 0.5
```

Your turn: Repeat the above for 7 successes. Can someone who guessed 7 out of 10 reasonably claim to be psychic?

### 13.4.2 Proportion test using the normal distribution

You may have heard that you can approximate the binomial distribution with a normal distribution. A binomial distribution has mean \(\small np\) and standard deviation \(\small \sqrt{np(1-p)}\). If \(\small n\) is large enough, it can be approximated with a normal distribution with mean \(np\) and standard deviation \(\small \sqrt{np(1-p)}\). Below I try it for $ n=10$ and \(\small p=0.5\).

```
n <- 10
p <- 0.5
q <- 1-p
x <- c(0:n)
ynorm <- dnorm(x,mean=n*p, sd = sqrt(n*p*q))
ybinom <- dbinom(x, n, p)
plot(x, ynorm)
lines(x, ybinom, col="red")
```

Your turn: Change the number of trials \(n\) and the probability of success \(p\). You should see that the “goodness of fit” depends on both. In general, it is better for large \(n\) and \(p\) closer to 0.5.

You may also know that the formula for binomial probabilities has a lot of \(n!\) factors in it, and that \(n!\) gets really large really fast. Because of that, the binomial probabilities are hard to compute (not so much for R, but historically). So, if one is testing a hypothesis about \(p\) and

- the conditions for a binomial distribution are given ,
- the n samples are mutually independent,
- the probability of a given outcome is the same for all n samples,
- there are exactly two groups / outcomes, and
- the sample is random, and
- the expected number of successes is \(>np_o\) and \(>n(1-p_o)\), then one can use the normal distribution approximation to the binomial distribution and use the test statistic \[z=\frac{p-p_o}{\sqrt{\frac{p_o (1-p_o)}{n}}}\] Here \(p_o\) is the hypothesized population proportion, and p is the observed probability.

However, as you saw below, the approximation to the normal distribution is not exact for small n, so R offers a small population correction. Have a look at the tests below, you should see the effect of the correction when you compare the binomial test ‘binom.test’ to the ‘prop.test’ using the normal approximation.

```
binom.test(100, 400, p=0.2)
#>
#> Exact binomial test
#>
#> data: 100 and 400
#> number of successes = 100, number of trials = 400,
#> p-value = 0.01466
#> alternative hypothesis: true probability of success is not equal to 0.2
#> 95 percent confidence interval:
#> 0.2083015 0.2954417
#> sample estimates:
#> probability of success
#> 0.25
prop.test(100, 400, p=0.2, correct=TRUE)
#>
#> 1-sample proportions test with continuity correction
#>
#> data: 100 out of 400, null probability 0.2
#> X-squared = 5.9414, df = 1, p-value = 0.01479
#> alternative hypothesis: true p is not equal to 0.2
#> 95 percent confidence interval:
#> 0.2089107 0.2959846
#> sample estimates:
#> p
#> 0.25
prop.test(100, 400, p=0.2, correct=FALSE)
#>
#> 1-sample proportions test without continuity
#> correction
#>
#> data: 100 out of 400, null probability 0.2
#> X-squared = 6.25, df = 1, p-value = 0.01242
#> alternative hypothesis: true p is not equal to 0.2
#> 95 percent confidence interval:
#> 0.2100790 0.2946771
#> sample estimates:
#> p
#> 0.25
binom.test(10, 40, p=0.2)
#>
#> Exact binomial test
#>
#> data: 10 and 40
#> number of successes = 10, number of trials = 40,
#> p-value = 0.4296
#> alternative hypothesis: true probability of success is not equal to 0.2
#> 95 percent confidence interval:
#> 0.1269148 0.4119620
#> sample estimates:
#> probability of success
#> 0.25
prop.test(10, 40, p=0.2, correct=TRUE)
#>
#> 1-sample proportions test with continuity correction
#>
#> data: 10 out of 40, null probability 0.2
#> X-squared = 0.35156, df = 1, p-value = 0.5532
#> alternative hypothesis: true p is not equal to 0.2
#> 95 percent confidence interval:
#> 0.1324509 0.4152042
#> sample estimates:
#> p
#> 0.25
prop.test(10, 40, p=0.2, correct=FALSE)
#>
#> 1-sample proportions test without continuity
#> correction
#>
#> data: 10 out of 40, null probability 0.2
#> X-squared = 0.625, df = 1, p-value = 0.4292
#> alternative hypothesis: true p is not equal to 0.2
#> 95 percent confidence interval:
#> 0.1418712 0.4019396
#> sample estimates:
#> p
#> 0.25
```

This test makes the following assumptions:

- the n samples are mutually independent,
- the probability of a given outcome is the same for all n samples,
- there are exactly two groups / outcomes,

Basically, this says that we are assuming we have a binomial random variable.

Note that there is no assumptions as to sample size. Obviously, more data is better. However, unlike the p-test you may know from elementary stats, this is an *exact* test and does not use the normal distribution as an approximation.

Syntax: binom.test(x, n, p,alternative = “two.sided”, “less”, “greater”,conf.level = 0.95)

Suppose you took a sample of 205 uniform[0,1] variables,and 36 of them were above 0.8. Test the claim that the proportion of values above 0.8 is 0.2, as it should be.

\(\small H_o\): \(p \le 0.2\)

\(\small H_a\): \(p>0.2\)

```
binom.test(36,205,0.2,alternative="two.sided")
#>
#> Exact binomial test
#>
#> data: 36 and 205
#> number of successes = 36, number of trials = 205,
#> p-value = 0.432
#> alternative hypothesis: true probability of success is not equal to 0.2
#> 95 percent confidence interval:
#> 0.1261296 0.2347330
#> sample estimates:
#> probability of success
#> 0.1756098
```

## 13.5 Testing equality of Variances, F-test

Use this when you need to test equality of two variances. Important: This test is very sensitive to non-normal data. You need to check if our data is normally distributed first!!

Example: Let’s demonstrate this using the before and after scores. First, we check normality. This is not the best way to do this, but the only way we have so far:

```
library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 4.3.2
score.before <- rep(c(1:10), 25)
score.after <- score.before - rnorm(250, 1.5, 0.3)
data <- data.frame(score.before, score.after)
ggplot(data, aes(sample=score.before))+
stat_qq(distribution = qnorm, dparams=list(mean(score.before),sd(score.before)))+
geom_abline(intercept=0, slope=1,color="red")
```

```
ggplot(data, aes(sample=score.after))+
stat_qq(distribution = qnorm, dparams=list(mean(score.after),sd(score.after)))+
geom_abline(intercept=0, slope=1,color="blue")
```

Next, we run the F-test to check for equality of variance:

\(\small H_o\): \(\frac{\sigma_{score.before}}{\sigma_{score.after}} = 1\), that is \(\sigma_{score.before}=\sigma_{score.after}\)

\(\small H_a\): \(\frac{\sigma_{score.before}}{\sigma_{score.after}} \ne 1\), that is \(\sigma_{score.before} \ne\sigma_{score.after}\)

```
var.test(x=score.before, y=score.after, alternative="two.sided")
#>
#> F test to compare two variances
#>
#> data: score.before and score.after
#> F = 0.99916, num df = 249, denom df = 249, p-value =
#> 0.9947
#> alternative hypothesis: true ratio of variances is not equal to 1
#> 95 percent confidence interval:
#> 0.7789401 1.2816508
#> sample estimates:
#> ratio of variances
#> 0.9991643
```

Looking at either the p-value or the confidence interval, we find that here is not enough evidence to reject equality equality of variances. They may - or may not - be equal.

## 13.6 Assignment

Show me that you can find and use tests on your own. Look up how to perform a two proportions test. I want you to produce a write up that includes:

- The pre-requisites
- The test statistic
- An explanation of the R function to use
- A detailed example. Make sure you state \(H_o\) and \(H_A\).

For the example, use the titanic data from earlier. Formulate a hypothesis and test it.