13 Covariance

Covariation, or covariance, is a measure of how two variables change together. If the values of one variable increase as the values of the other variable increase, the covariance is positive, indicating a positive relationship. If the values of one variable decrease as the values of the other variable increase, the covariance is negative, indicating a negative relationship. If there is no apparent pattern between the variables, the covariance will be close to 0, indicating no relationship.

Mathematically, the covariance between two variables X and Y, with n observations each, can be calculated using the formula:

Cov(X, Y) = Σ((Xi - X_mean) * (Yi - Y_mean)) / (n - 1)

where:

Xi and Yi are the individual data points X_mean and Y_mean are the means of the X and Y variables, respectively Σ denotes the sum over all data points n is the number of data points In a scatterplot, the covariance can be visually inferred by the direction and the closeness of the points. When the points are tightly clustered around a positive slope, the covariance is positive. If they are clustered around a negative slope, the covariance is negative. If the points are scattered randomly with no clear pattern, the covariance is close to 0.

We can use R inbuilt cov() function to calculate covariance.

library(readr)
#> Warning: package 'readr' was built under R version 4.2.2
# Read the CSV file into a data frame
covdata <- read_csv("exampledata/Ch10_kidspets.csv")
#> Rows: 5 Columns: 2
#> ── Column specification ────────────────────────────────────
#> Delimiter: ","
#> dbl (2): Children, Pets
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

# Calculate the covariance between two variables (e.g., var1 and var2)
covariance <- cov(covdata$Children, covdata$Pets)
print(covariance)
#> [1] 4.5

A covariance of 4.5 means that there is a positive relationship between the two variables. As one variable increases, the other variable also tends to increase. However, it is important to note that covariance alone does not provide information about the strength of this relationship, as the scale of the covariance depends on the scales of the two variables.

To better understand the strength and direction of the relationship between two variables, you can calculate the correlation coefficient (e.g., Pearson’s correlation coefficient), which is a standardized measure of association that ranges from -1 to 1.