## 13.4 Correlation: `cor.test()`

Argument | Description |
---|---|

`formula` |
A formula in the form `~ x + y` , where x and y are the names of the two variables you are testing. These variables should be two separate columns in a dataframe. |

`data` |
The dataframe containing the variables x and y |

`alternative` |
A string specifying the alternative hypothesis. Can be `"two.sided"` indicating a two-tailed test, or `"greater"` or “`less"` for a one-tailed test. |

`method` |
A string indicating which correlation coefficient to calculate and test. `"pearson"` (the default) stands for Pearson, while `"kendall"` and `"spearman"` stand for Kendall and Spearman correlations respectively. |

`subset` |
A vector specifying a subset of observations to use. E.g.; `subset = sex == "female"` |

Next we’ll cover two-sample correlation tests. In a correlation test, you are accessing the relationship between two variables on a ratio or interval scale: like height and weight, or income and beard length. The test statistic in a correlation test is called a *correlation coefficient* and is represented by the letter r. The coefficient can range from -1 to +1, with -1 meaning a strong negative relationship, and +1 meaning a strong positive relationship. The null hypothesis in a correlation test is a correlation of 0, which means no relationship at all:

To run a correlation test between two variables x and y, use the `cor.test()`

function. You can do this in one of two ways, if x and y are columns in a dataframe, use the formula notation (`formula = ~ x + y`

). If x and y are separate vectors (not in a dataframe), use the vector notation (`x, y`

):

```
# Correlation Test
# Correlating two variables x and y
# Method 1: Formula notation
## Use if x and y are in a dataframe
cor.test(formula = ~ x + y,
data = df)
# Method 2: Vector notation
## Use if x and y are separate vectors
cor.test(x = x,
y = y)
```

Let’s conduct a correlation test on the `pirates`

dataset to see if there is a relationship between a pirate’s age and number of parrots they’ve had in their lifetime. Because the variables (age and parrots) are in a dataframe, we can do this in formula notation:

```
# Is there a correlation between a pirate's age and
# the number of parrots (s)he's owned?
# Method 1: Formula notation
age.parrots.ctest <- cor.test(formula = ~ age + parrots,
data = pirates)
# Print result
age.parrots.ctest
##
## Pearson's product-moment correlation
##
## data: age and parrots
## t = 6, df = 1000, p-value = 1e-09
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.13 0.25
## sample estimates:
## cor
## 0.19
```

We can also do the same thing using vector notation – the results will be exactly the same:

```
# Method 2: Vector notation
age.parrots.ctest <- cor.test(x = pirates$age,
y = pirates$parrots)
# Print result
age.parrots.ctest
##
## Pearson's product-moment correlation
##
## data: pirates$age and pirates$parrots
## t = 6, df = 1000, p-value = 1e-09
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.13 0.25
## sample estimates:
## cor
## 0.19
```

Looks like we have a positive correlation of 0.19 and a very small p-value. To see what information we can extract for this test, let’s run the command `names()`

on the test object:

```
names(age.parrots.ctest)
## [1] "statistic" "parameter" "p.value" "estimate" "null.value"
## [6] "alternative" "method" "data.name" "conf.int"
```

Looks like we’ve got a lot of information in this test object. As an example, let’s look at the confidence interval for the population correlation coefficient:

```
# 95% confidence interval of the correlation
# coefficient
age.parrots.ctest$conf.int
## [1] 0.13 0.25
## attr(,"conf.level")
## [1] 0.95
```

Just like the `t.test()`

function, we can use the `subset`

argument in the `cor.test()`

function to conduct a test on a subset of the entire dataframe. For example, to run the same correlation test between a pirate’s age and the number of parrot’s she’s owned, but *only* for female pirates, I can add the `subset = sex == "female"`

argument:

```
# Is there a correlation between age and
# number parrots ONLY for female pirates?
cor.test(formula = ~ age + parrots,
data = pirates,
subset = sex == "female")
##
## Pearson's product-moment correlation
##
## data: age and parrots
## t = 5, df = 500, p-value = 4e-06
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.12 0.30
## sample estimates:
## cor
## 0.21
```

The results look pretty much identical. In other words, the strength of the relationship between a pirate’s age and the number of parrot’s they’ve owned is pretty much the same for female pirates and pirates in general.