Chapter 4 Descriptive Statistics

4.1 Uni-Variate

4.1.1 statistics

  • summary(DF) descriptive summary for all the variables the data frame (use DF$VARIABLE for specific variables)
  • describe(DF) from the package Hmisc does similar job
    • desc <- data.frame(describe(DF2)) creates a desc object that is the descriptive data frame that is easy to edit and export
  • freq(DF) from the package summarytools provides detailed frequencies for all the variables the data frame (use DF$VARIABLE for specific variables)

4.1.2 graphs

  • hist(DF$VARIABLE) provides an histogram for specific variables
  • plot(DF$VARIABLE) provides scatters of bar charts for variables depending if they are numeric or categorical
  • boxplot(DF) provides a box plot for all the variables in the data frame (use DF$VARIABLE for specific variables)
  • barchart(DF$VARIABLE) from the package lattice provides a bar chart with categorical variables

4.2 Bi-Variate

4.2.1 statistics

  • TBL <- table(DF$FACTOR1, DF$FACTOR2) creates a 2x2 contingency table
    • chisq.test(TBL) can be used to test for statistical differences (MASSpackage is required)
  • describeBy(DF$VARIABLE_num, DF$VARIABLE_cat) from the package psych provides detailed descriptive statistics of a numeric variable by the levels of the categorical variable
  • cor(DF, use = "pairwise.complete.obs") prints a correlation matrix with use specifying a pairwise deletion of missing cases
  • t.test(), aov(), lm() and glm() functions can be used to represent and test the statistcal effects, see more in 5

4.2.2 graphs

  • boxplot(DF$VARIABLE_num ~ DF$VARIABLE_cat) provides box plots of a numeric variable by the levels of the categorical variable
  • plotmeans(DF$VARIABLE_num ~ DF$VARIABLE_cat, mean.labels=TRUE) from package ggplots2, provides a chart with the averages of a numeric variable by the levels of the categorical variable
  • plot(DF$VARIABLE_num ~ DF$VARIABLE_num) provides a scatter dot of a numeric variable by another numeric variable
  • plot(lm(DF$VARIABLE_num ~ DF$VARIABLE_num) set of graphs embedded in the lm() function
  • COR <- cor(DF, use = "pairwise.complete.obs") and corrplot(COR) from the corrplot package creates a visual matrix for the COR object (the arguments method, type and diag allow for substantial configuration of the plot)

4.3 Data visualization [in preprartion]