## 5.4 Visualiser les distributions

### Histogramme

``qplot(x = hwy, data = mpg)``
``````ggplot(mpg, aes(x = hwy)) +
geom_histogram()``````
```stat_bin()` using `bins = 30`. Pick better value with `binwidth`.``

``````qplot(x = hwy, data = mpg, binwidth = 2, color = I("black"), fill = I("grey"))
qplot(x = hwy, y = ..density.., data = mpg, geom = "histogram", binwidth = 2)``````
``````ggplot(mpg, aes(x = hwy)) +
geom_histogram(binwidth = 2, color = "black", fill = "grey") # left
ggplot(mpg, aes(x = hwy, y = ..density..)) +
geom_histogram(binwidth = 2) # right  (y=..density.. <--> surface=1)``````

Remarque: Pour selectioner un “binwidth” optimale, utilisez la fonction la fonction `dpih` du package `KernSmooth` comme ceci `dpih(mpg\$hwy)`.

### Densité

``qplot(x = hwy, data = mpg, geom = "density")``
``````ggplot(mpg, aes(x = hwy)) +
geom_density()``````

Vous pouvez utilisez le paramétre `bw` (bandwidth) pour contrôler le lissage de cette courbe.

``````p <- ggplot(mpg, aes(x = hwy))
p + geom_density(bw = 0.5)  # left
p + geom_density(bw = 1.15)  # right (bw optimale ---> KernSmooth::dpik(mpg\$hwy))``````

Vous pouvez aussi superposer un histogramme et une densité ou plusieurs densités.

``````qplot(x = hwy, y = ..density.., data = mpg, geom = "histogram", binwidth = 2) +
geom_density(bw = 1.15, size = 1)
qplot(x = hwy, data = mpg, geom = "density", color = drv, fill = drv, size = I(1),
alpha = I(0.05))``````
``````ggplot(mpg, aes(x = hwy, y = ..density..)) +
geom_histogram(binwidth = 2) +
geom_density(bw = 1.15, size = 1) # left
ggplot(mpg, aes(x = hwy)) +
geom_density(size = 1, alpha = 0.05) +
aes(color = drv, fill = drv)  # right``````

Finalement, vous pouvez visualiser une densité théorique ou simplement n’importe quelle fonction!

``````ggplot() + aes(x = c(-20, 20)) +
stat_function(fun = function(x) { x^2 }, geom = "line") # left

ggplot() + aes(x = c(-20, 20)) +
stat_function(fun = dnorm, args = list(mean = 0, sd = 6), geom = "line") # right``````

### Boîte à moustaches

``````qplot(x = "", y = hwy, data = mpg, geom = "boxplot")
qplot(x = drv, y = hwy, data = mpg, geom = "boxplot", varwidth = TRUE)``````
``````ggplot(mpg, aes(x = "", y = hwy)) +
geom_boxplot() # left
ggplot(mpg, aes(x = drv, y = hwy)) +
geom_boxplot(varwidth = TRUE) # right
# varwidth = TRUE <--> varier la largeur des boîtes en fonction des effectifs de chaque classe.``````

``````qplot(x = drv, y = hwy, data = mpg, geom = "boxplot", varwidth = TRUE) + geom_jitter(alpha = 0.2,
size = 2, width = 0.1, height = 0)
qplot(x = drv, y = hwy, data = mpg, geom = "boxplot", varwidth = TRUE) + facet_wrap(~class)``````
``````p <- ggplot(mpg, aes(x = drv, y = hwy)) + geom_boxplot(varwidth = TRUE)
p + geom_jitter(alpha = 0.2, size = 2, width = 0.1, height = 0) # left
p + facet_wrap(~class) # right``````

### Diagramme en barres

``qplot(x = drv, data = mpg)``
``ggplot(mpg, aes(x = drv)) + geom_bar()``

Par défaut, les hauteurs des barres correspondent aux fréquences observées des modalités de x.

``````dt <- data.frame(xtabs(~drv, data = mpg))
dt``````
``````  drv Freq
1   4  103
2   f  106
3   r   25``````

Si vous déposez des données sous forme de fréquence (comme dans `dt`) alors la commande `ggplot(dt, aes(x = drv)) + geom_bar()` n’est pas utile!

Pour obtenir le résultat escompté, vous avez deux possibilités:

``````ggplot(dt, aes(x = drv, y = Freq)) +
geom_bar(stat = "identity")
# ou
ggplot(dt, aes(x = drv, y = Freq)) +
geom_col()``````

Voici un autre example. Soit

``````dt <- data.frame(xtabs(~cyl + drv, data = mpg))
dt``````
``````   cyl drv Freq
1    4   4   23
2    5   4    0
3    6   4   32
4    8   4   48
5    4   f   58
6    5   f    4
7    6   f   43
8    8   f    1
9    4   r    0
10   5   r    0
11   6   r    4
12   8   r   21``````
``qplot(x = drv, y = Freq, data = dt, fill = cyl, geom = "col")``
``````ggplot(dt, aes(x = drv, y = Freq)) + geom_col() +
aes(fill = cyl) # left
ggplot(dt, aes(x = drv, y = Freq, fill = cyl)) +
geom_col(position = "dodge") # right``````