# Chapter 4 Descriptive Statistics

## 4.1 Uni-Variate

### 4.1.1 statistics

• summary(DF) descriptive summary for all the variables the data frame (use DF$VARIABLE for specific variables) • describe(DF) from the package Hmisc does similar job • desc <- data.frame(describe(DF2)) creates a desc object that is the descriptive data frame that is easy to edit and export • freq(DF) from the package summarytools provides detailed frequencies for all the variables the data frame (use DF$VARIABLE for specific variables)

### 4.1.2 graphs

• hist(DF$VARIABLE) provides an histogram for specific variables • plot(DF$VARIABLE) provides scatters of bar charts for variables depending if they are numeric or categorical
• boxplot(DF) provides a box plot for all the variables in the data frame (use DF$VARIABLE for specific variables) • barchart(DF$VARIABLE) from the package lattice provides a bar chart with categorical variables

## 4.2 Bi-Variate

### 4.2.1 statistics

• TBL <- table(DF$FACTOR1, DF$FACTOR2) creates a 2x2 contingency table
• chisq.test(TBL) can be used to test for statistical differences (MASSpackage is required)
• describeBy(DF$VARIABLE_num, DF$VARIABLE_cat) from the package psych provides detailed descriptive statistics of a numeric variable by the levels of the categorical variable
• cor(DF, use = "pairwise.complete.obs") prints a correlation matrix with use specifying a pairwise deletion of missing cases
• t.test(), aov(), lm() and glm() functions can be used to represent and test the statistcal effects, see more in 5

### 4.2.2 graphs

• boxplot(DF$VARIABLE_num ~ DF$VARIABLE_cat) provides box plots of a numeric variable by the levels of the categorical variable
• plotmeans(DF$VARIABLE_num ~ DF$VARIABLE_cat, mean.labels=TRUE) from package ggplots2, provides a chart with the averages of a numeric variable by the levels of the categorical variable
• plot(DF$VARIABLE_num ~ DF$VARIABLE_num) provides a scatter dot of a numeric variable by another numeric variable
• plot(lm(DF$VARIABLE_num ~ DF$VARIABLE_num) set of graphs embedded in the lm() function
• COR <- cor(DF, use = "pairwise.complete.obs") and corrplot(COR) from the corrplot package creates a visual matrix for the COR object (the arguments method, type and diag allow for substantial configuration of the plot)