4.4 Divergence Metrics and Test for Comparing Distributions

Similarity among distributions using divergence statistics, which is different from

Deviation statistics: difference between the realization of a variable and some value (e.g., mean). Statistics of the deviation distributions consist of standard deviation, average absolute deviation, median absolute deviation , maximum absolute deviation.
Deviance statistics: goodness-of-fit statistic for statistical models (comparable to the sum of squares of residuals in OLS to cases that use ML estimation). Usually used in generalized linear models.

Divergence statistics is a statistical distance (different from metrics)

Divergences do not require symmetry
Divergences generalize squared distance (instead of linear distance). Hence, fail the triangle inequity

Can be used for

Detecting data drift in machine learning
Feature selections
Variational Auto Encoder
Detect similarity between policies (i.e., distributions) in reinforcement learning
To see consistency in two measured variables of two constructs.

Techniques

Packages

entropy
philentropy

4.4.1 Kullback-Leibler Divergence

Also known as relative entropy
Not a metric (does not satisfy the triangle inequality)
Can be generalized to the multivariate case
Measure the similarity between two discrete probability distributions
- \(P\) = true data distribution
- \(Q\) = predicted data distribution
It quantifies info loss when moving from \(P\) to \(Q\) (i.e., information loss when \(P\) is approximated by \(Q\))

Discrete

\[ D_{KL}(P ||Q) = \sum_i P_i \log(\frac{P_i}{Q_i}) \]

Continuous

\[ D_{KL}(P||Q) = \int P(x) \log(\frac{P(x)}{Q(x)}) dx \]

where

\(K \in [0, \infty)\) from similar to diverge
Non-symmetric between two distributions: \(D_{KL}(P|Q) \neq D_{KL}(Q|P)\)

library(philentropy)
# philentropy::dist.diversity(rbind(X = 1:10 / sum(1:10), 
#                                   Y = 1:20 / sum(1:20)),
#                             p = 2,
#                             unit = "log2")


# continuous
KL(rbind(X = 1:10 / sum(1:10), Y = 1:10 / sum(1:10)), unit = "log2")
#> kullback-leibler 
#>                0

# discrete
KL(rbind(X = 1:10, Y = 1:10), est.prob = "empirical")
#> kullback-leibler 
#>                0

4.4.2 Jensen-Shannon Divergence

Also known as info radius or total divergence to the average

\[ D_{JS} (P ||Q) = \frac{1}{2}( D_{KL}(P||M)+ D_{KL}(Q||M)) \]

where

\(M = \frac{1}{2} (P + Q)\) is a mixed distribution
\(D_{JS} \in [0,1]\) for \(\log_2\) and \(D_{JS} \in [0,\ln(2)]\) for \(\log_e\)

library(philentropy)
# continous
JSD(rbind(X = 1:10, Y = 1:20), unit = "log2")
#> jensen-shannon 
#>       20.03201

# discrete
JSD(rbind(X = 1:10, Y = 1:20), est.prob = "empirical")
#> jensen-shannon 
#>     0.06004756

4.4.3 Wasserstein Distance

measure the distance between two empirical CDFs

\[ W = \int_{x \in R}|E(x) - F(X)|^p \]

This is also a test statistics

set.seed(1)
transport::wasserstein1d(rnorm(100), rnorm(100, mean = 1))
#> [1] 0.8533046

set.seed(1)
# Wasserstein metric 
twosamples::wass_stat(rnorm(100), rnorm(100, mean = 1))
#> [1] 0.8533046

set.seed(1)
# permutation-based tw sample test using Wasserstein metric
twosamples::wass_test(rnorm(100), rnorm(100, mean = 1))
#> Test Stat   P-Value 
#> 0.8533046 0.0002500

4.4.4 Kolmogorov-Smirnov Test

Can be used for continuous distribution

\(H_0\): Empirical distribution follows a specified distribution

\(H_1\): Empirical distribution does not follow a specified distribution

Using non-parametric

\[ D= \max|P(X) - Q(X)| \]

\(D \in [0,1]\) from the densities are evenly distributed to not evenly distributed

library(entropy)
library(tidyverse)

lst = list(sample_1 = c(1:20), sample_2 = c(2:30), sample_3 = c(3:30))

expand.grid(1:length(lst), 1:length(lst)) %>%
    rowwise() %>%
    mutate(KL = KL.empirical(lst[[Var1]], lst[[Var2]]))
#> # A tibble: 9 × 3
#> # Rowwise: 
#>    Var1  Var2     KL
#>   <int> <int>  <dbl>
#> 1     1     1 0     
#> 2     2     1 0.150 
#> 3     3     1 0.183 
#> 4     1     2 0.704 
#> 5     2     2 0     
#> 6     3     2 0.0679
#> 7     1     3 0.622 
#> 8     2     3 0.0870
#> 9     3     3 0

To use the test for discrete date, use bootstrap version of the KS test (bypass the continuity requirement)

Matching::ks.boot(Tr = c(0:10), Co = c(0:10))
#> $ks.boot.pvalue
#> [1] 1
#> 
#> $ks
#> 
#>  Exact two-sample Kolmogorov-Smirnov test
#> 
#> data:  Tr and Co
#> D = 0, p-value = 1
#> alternative hypothesis: two-sided
#> 
#> 
#> $nboots
#> [1] 1000
#> 
#> attr(,"class")
#> [1] "ks.boot"