Descriptive Statistics
Uni-Variate
statistics
summary(DF)
descriptive summary for all the variables the data frame (use DF$VARIABLE
for specific variables)
describe(DF)
from the package Hmisc
does similar job
desc <- data.frame(describe(DF2))
creates a desc
object that is the descriptive data frame that is easy to edit and export
freq(DF)
from the package summarytools
provides detailed frequencies for all the variables the data frame (use DF$VARIABLE
for specific variables)
graphs
hist(DF$VARIABLE)
provides an histogram for specific variables
plot(DF$VARIABLE)
provides scatters of bar charts for variables depending if they are numeric or categorical
boxplot(DF)
provides a box plot for all the variables in the data frame (use DF$VARIABLE
for specific variables)
barchart(DF$VARIABLE)
from the package lattice
provides a bar chart with categorical variables
Bi-Variate
statistics
TBL <- table(DF$FACTOR1, DF$FACTOR2)
creates a 2x2 contingency table
chisq.test(TBL)
can be used to test for statistical differences (MASS
package is required)
describeBy(DF$VARIABLE_num, DF$VARIABLE_cat)
from the package psych
provides detailed descriptive statistics of a numeric variable by the levels of the categorical variable
cor(DF, use = "pairwise.complete.obs")
prints a correlation matrix with use
specifying a pairwise deletion of missing cases
t.test()
, aov()
, lm()
and glm()
functions can be used to represent and test the statistcal effects, see more in 5
graphs
boxplot(DF$VARIABLE_num ~ DF$VARIABLE_cat)
provides box plots of a numeric variable by the levels of the categorical variable
plotmeans(DF$VARIABLE_num ~ DF$VARIABLE_cat, mean.labels=TRUE)
from package ggplots2
, provides a chart with the averages of a numeric variable by the levels of the categorical variable
plot(DF$VARIABLE_num ~ DF$VARIABLE_num)
provides a scatter dot of a numeric variable by another numeric variable
plot(lm(DF$VARIABLE_num ~ DF$VARIABLE_num)
set of graphs embedded in the lm()
function
COR <- cor(DF, use = "pairwise.complete.obs")
and corrplot(COR)
from the corrplot
package creates a visual matrix for the COR
object (the arguments method
, type
and diag
allow for substantial configuration of the plot)
Data visualization [in preprartion]
Online resources on data visualization