Chapter 10 Upcoming topics
10.1 FAQs
10.1.0.1 Testing for Normality
Testing for Normality of a variable can be described as testing to see if the data follows the Normal distribution. The most commonly used way to assess this is by visual inspection, either by plotting a Histogram (from the data) or by plotting a Q-Q plot (or Quantile-Quantile plot). The formal methods commonly used are the Shapiro-Wilk test and the Kolmogorov-Smirnov test.
10.1.0.1.1 Visual methods
When the data is plotted as a histogram, Normality is assessed by inspecting the shape of the histogram. Typically, if the histogram exhibits a bell-shape, the data is assumed to be Normal. The shape of the histogram can also indicate whether the distribution is skewed. For the Q-Q plot, Normality is assumed if the points on the plot follow (approximately) the diagonal line. In this case, the Q-Q plot is a scatterplot of the quantiles from the data plotted against theoretical, Normally distributed quantiles.
Try the following code to test the Haggis data for Normality.
library(ggplot2)
<- data.frame(c(before,after))
ba $timeframe <- c(rep("before",length(before)),rep("after",length(after)))
bacolnames(ba)<-c("number","timeframe")
# Histogram
ggplot(ba, aes(x = number)) + facet_wrap(. ~timeframe,ncol=1)+
geom_histogram(binwidth=1,aes(fill = ..count..)) +
scale_x_continuous(name = "Percentage of limpers per household") +
scale_y_continuous(name = "Count") +
ggtitle("Percentage of limpers per Haggis household",
subtitle="before and after physiotherapy")+
theme(text = element_text(size=40))
# Q-Q plot
qqnorm(before)
qqnorm(after)
10.1.0.1.2 Formal methods
For both of the abovementioned formal methods, a p-value greater than 0.05 indicates that the data is Normally distributed.
Try the following code to test the Haggis data for Normality.
# Shapiro-Wilk test
shapiro.test(before)
##
## Shapiro-Wilk normality test
##
## data: before
## W = 0.99444, p-value = 0.958
# Kolgomorov test
ks.test(after, 'pnorm')
##
## One-sample Kolmogorov-Smirnov test
##
## data: after
## D = 1, p-value < 2.2e-16
## alternative hypothesis: two-sided