# Chapter 12 Summary Statistics

Now that we have the `psych`

package loaded, we can use the `describe()`

function. The `sumamry()`

function is in the `base`

package, so we could have used that already.

`summary()`

is a generic function, but can be quite handy to get a feel for your data. However, it does give quite different output depending on the type of data you give it. Try it out on some of the different data frames you’ve created today.

`summary(exp)`

```
## ID dose effect
## Min. : 1.0 Placebo :5 Min. :1.000
## 1st Qu.: 4.5 Low_dose :5 1st Qu.:2.000
## Median : 8.0 High_dose:5 Median :3.000
## Mean : 8.0 Mean :3.467
## 3rd Qu.:11.5 3rd Qu.:4.500
## Max. :15.0 Max. :7.000
```

`describe()`

is a little different, and it is mostly intended to be used when your data is in interval or ratio scale. It calculates the same descriptive statistics for any type of variable you give it. Try it out!

`describe(exp)`

```
## vars n mean sd median trimmed mad min max range skew kurtosis
## ID 1 15 8.00 4.47 8 8.00 5.93 1 15 14 0.00 -1.44
## dose* 2 15 2.00 0.85 2 2.00 1.48 1 3 2 0.00 -1.69
## effect 3 15 3.47 1.77 3 3.38 1.48 1 7 6 0.34 -0.97
## se
## ID 1.15
## dose* 0.22
## effect 0.46
```

Did you notice any `*`

’s beside the variable names in your output? The `describe()`

function converts factors and logical variables in order to do the calculations. They are then marked with the `*`

, and the output generally won’t make much sense.

What if you wanted to see descriptive statistics by group? This is pretty easy to do in `R`

. Instead of `describe()`

, you just use the `describeBy()`

function. These two functions are very similar, but in `describeBy()`

you need to specify your `group`

variable. Let’s try it out:

`describeBy(exp$effect, group = exp$dose)`

```
##
## Descriptive statistics by group
## group: Placebo
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 5 2.2 1.3 2 2.2 1.48 1 4 3 0.26 -1.96 0.58
## --------------------------------------------------------
## group: Low_dose
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 5 3.2 1.3 3 3.2 1.48 2 5 3 0.26 -1.96 0.58
## --------------------------------------------------------
## group: High_dose
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 5 5 1.58 5 5 1.48 3 7 4 0 -1.91 0.71
```

Now you have separate descriptive statistics for each of the dose groups.

There are other ways of doing this, such as by using the `by()`

and `aggregate()`

functions. We won’t be covering these today.

## 12.1 Data Cleaning

A tedious part of data analysis is addressing the problem of miscoded data that need to be converted to `NA`

or some other value.

In our `exp_lowscore`

example, we can use the `scrub`

function to change the values of `7`

to `NA`

for us:

```
library(psych)
clean_lowscore <-scrub(exp_lowscore, where = 3, min = rep(1, 9), max = rep(6, 9))
```

This function can be very helpful when working with large data sets, and can be applied to full data frames or selected columns.