Summarizing Data
In R, you can easily obtain summary statistics of the data. Below are some of the examples.
$score1 # Note that 'score' variable is a continuous variable data
## [1] 35 23 14 17 23 35 27 33 32 31 34 27 51 36 39 45 31 40 25 32
mean(data$score1) # returns the mean of 'score' variable
## [1] 31.5
median(data$score1) # returns the median of 'score' variable
## [1] 32
var(data$score1) # returns the variance of 'score' variable
## [1] 78.36842
sd(data$score1) # returns the standard deviation of 'score' variable
## [1] 8.852594
max(data$score1) # returns the maximum value of 'score' variable
## [1] 51
min(data$score1) # returns the minimum value of 'score' variable
## [1] 14
range(data$score1) # returns the range of 'score' variable
## [1] 14 51
summary(data$score1)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 14.00 26.50 32.00 31.50 35.25 51.00
$group # Note that 'group' variable is a categorical variable data
## [1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2
table(data$group) # counts the number of observations for each 'group'
##
## 1 2
## 10 10
cov(scores) # covariance matrix of the 'scores' data matrix (score1, score2)
## score1 score2
## score1 78.36842 51.31579
## score2 51.31579 151.68421
cor(scores) # correlation matrix of the 'scores' data matrix (score1, score2)
## score1 score2
## score1 1.0000000 0.4706632
## score2 0.4706632 1.0000000
We can calculate the row or column means of the matrix by using apply() function.
# apply(matrix, margin, function) # if margin=1, the function is applied row-wise; if 2, column-wise
apply(scores, 1, sum) # returns the row sums of the 'scores' matrix
## [1] 80 37 40 42 50 82 64 83 47 68 82 43 96 62 76 86 56 57 40 59
apply(scores, 2, mean) # returns the column means of the 'scores' matrix
## score1 score2
## 31.5 31.0
apply(scores, 2, sd) # returns the standard deviations for each column
## score1 score2
## 8.852594 12.316014
Summary statistics of a specific variable for each category of a categorical variable can be produced by using tapply() function. tapply() requires you to type in a vector, an index (i.e., factor(s)), and a function to apply. Note that the vector and factor must have the same length.
# tapply(vector object, index, function)
tapply(data$score1, as.factor(data$group), mean) # returns the mean of 'score1' for each 'group'
## 1 2
## 27 36
To write the codes more efficiently, you can use with() function. For example:
# with(data, expression)
with(data, mean(score1)) # returns the mean of 'score1' in 'data'
## [1] 31.5
with(data, tapply(score1, group, mean)) # returns the mean of 'score1' for each 'group' in 'data'
## 1 2
## 27 36