## Summarizing and Processing Data

In R, you can easily obtain summary statistics of the data. Below are some of the examples.

``data\$score             # Note that 'score' variable is a continuous variable``
``##   35 23 14 17 23 35 27 33 32 31 34 27 51 36 39 45 31 40 25 32``
``mean(data\$score)       # returns the mean of 'score' variable``
``##  31.5``
``median(data\$score)     # returns the median of 'score' variable``
``##  32``
``var(data\$score)        # returns the variance of 'score' variable``
``##  78.36842``
``sd(data\$score)         # returns the standard deviation of 'score' variable``
``##  8.852594``
``range(data\$score)      # returns the range of 'score' variable``
``##  14 51``
``summary(data\$score)``
``````##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
##   14.00   26.50   32.00   31.50   35.25   51.00``````
``data\$method            # Note that 'method' variable is a categorical variable``
``##   1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2``
``table(data\$method)     # counts the number of observations for each 'method'``
``````##
##  1  2
## 10 10``````
``summary(data\$method)``
``````##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
##     1.0     1.0     1.5     1.5     2.0     2.0``````

Summary statistics of ‘score’ variable for each category of ‘method’ can be also produced by using tapply() function. tapply() requires you to type in a vector, an index (i.e., factor(s)), and a function to apply. Note that the vector and factor must have the same length.

``````# tapply(vector object, index, function)
tapply(data\$score, as.factor(data\$method), mean)  # returns the mean of 'score' for each 'method'``````
``````##  1  2
## 27 36``````

To write the codes more efficiently, you can use with() function. For example:

``````# with(data, expression)
with(data, mean(score))   # returns the mean of 'score' in 'data'``````
``##  31.5``
``with(data, tapply(score, method, mean))   # returns the mean of 'score' for each 'method' in 'data'``
``````##  1  2
## 27 36``````