# 10 Communication

## 10.1 Information Theory

### 10.1.1 Entropy

Most of the examples are taken from philentropy package in R, which can be accessed via this link

#### 10.1.1.1 Discrete Entropy

If $$\mathbf{p}_X = (p_{x1},p_{x2},...,p_{xn})$$ is the probability distribution of the discrete random variable X with $$P(X = x) = p_{xi}$$. Then, the entropy of the distribution is

$H(\mathbf{p}_X) = -\sum_{x_i} p_{xi}\log_2 p_{xi}$

where

• entropy unit is bits, and $$H(p) \in [0,1]$$ and max at $$p = 0.5$$
• or the unit is nats ($$log_e$$)
• or the unit is dits ($$log_{10}$$)

In general, if you have different bases, you can convert between units. Given bases a and b

$\log_a k = \log_a b \times log_b k$

where $$\log_a b$$ is a constant.

Generalize

$H_b(X) = log_b a \times H_a(X)$

Equivalently,

$H(X) = E(\frac{1}{\log p(x)})$

Entropy is the measure of how much information gained from learning the value of X . Entropy depends on the distribution of X.

Example:

If X has 2 values, then $$\mathbf{p}= (p,1-p)$$ and

$H(\mathbf{p}_X ) = H(p,1-p) = -p \log_2 p - (1-p) \log_2 (1-p)$

which is known as binary entropy function

Plot of p vs. H(p) where max of $$H(\mathbf{p}_X) = \log_2 n$$ when $$X \sim U$$

hb <- function(gamma) {
-gamma * log2(gamma) - (1 - gamma) * log2(1 - gamma) # output in bits
}

xs <- seq(0, 1, len = 100)
plot(xs,
hb(xs),
type = 'l',
xlab = "p",
ylab = "H(p)")
library("philentropy")
## Warning: package 'philentropy' was built under R version 4.0.5
Prob <- 1:10/sum(1:10) # probabilities P(X)
H(Prob) # Shannon's Entropy
## [1] 3.103643

#### 10.1.1.2 Mutual Information

If X and Y are two discrete random variables and $$\mathbf{p}_{X \times Y}$$ is the their joint distribution, the mutual information (i.e., learning of a value of X gives information about the value of Y) of X and Y is

$I(X,Y) = I(Y,X) = H(\mathbf{p}_X)+ H(\mathbf{p}_Y)- H(\mathbf{p}_{X \times Y})$

where $$I(X,Y) \ge 0$$ and $$I(X,Y) = 0$$ iff X and Y are independent (information about X gives no information about Y).

Equivalently,

$MI(X,Y) = \sum_{i=1}^n \sum_{j=1}^m P(x_i, y_j) \times \log_b (\frac{P(x_i,y_j)}{P(x_i) \times P(y_j)})$

Properties:

• $$I(X;X) = H(X)$$ known as self-information
• $$I(X;Y) \ge 0$$ with equality iff $$X \perp Y$$
P_x <- 1:10/sum(1:10) # distribution P(X)
P_y <- 20:29/sum(20:29) # distribution P(Y)
P_xy <- 1:10/sum(1:10) # joint distribution P(X,Y)
MI(P_x, P_y, P_xy) # Mutual Information
## [1] 3.311973

#### 10.1.1.3 Relative Entropy

The measure of the distance between distribution p and q is called relative entropy, denoted by $$D(p ||q)$$

Equivalently, it measures the inefficiency of assuming distribution q when the true distribution is p.

The relative entropy between p(x) and q(x) is

$D(p||q) = \sum_x p(x) \log \frac{p(x)}{q(X)}$

where

• $$D(p||q) \ge 0$$ and equality iff $$p = q$$

#### 10.1.1.4 Joint-Entropy

$H(X,Y) = - \sum_{i=1}^n \sum_{j = 1}^m P(x_i, y_j) \times \log_b(P(x_i,y_j))$

P_xy <- 1:10/sum(1:10) # joint distribution P(X,Y)
JE(P_xy) # Joint-Entropy
## [1] 3.103643

#### 10.1.1.5 Conditional Entropy

$H(Y|X) = - \sum_{i=1}^n \sum_{j = 1}^m P(x_i, y_j) \times \log_b(\frac{P(x_i)}{P(x_i,y_j)})$

P_x <- 1:10/sum(1:10) # distribution P(X)
P_y <- 1:10/sum(1:10) # distribution P(Y)
CE(P_x, P_y) # Joint-Entropy
## [1] 0

Relation among entropy measures

$H(X,Y) = H(X) + H(Y|X) = H(Y) + H(X|Y)$

Properties:

• Chain rule: $$H(X_1, ..., X_n) = \sum_{i=1}^n H(X_i | X_{i-1},...,X_1)$$
• $$0 \le H(Y|X) \le H(Y)$$ Information on X does not increase the entropy measure of Y
• If $$X \perp Y$$, then $$H(Y|X) = H(Y); H(X,Y) = H(X) + H(Y)$$
• $$H(X) \le log |n_X|$$ where $$n_X$$ = number of elements in X
• $$H(X) = \log (n_X)$$ iff $$X \sim U$$
• $$H(X_1,...,X_n) \le \sum_i^n H(X_i)$$
• $$H(X_1,...,X_n) = \sum_i^n H(X_i)$$ iff $$X_i$$’s are mutually independent.

#### 10.1.1.6 Continuous Entropy

If X is a continuous variable, its entropy is

$h(\mathbf{X}) = - \int_{R^d} p(\mathbf{x})\log p(\mathbf{x}) dx$

where d is the dimension of X.

If X and Y are continuous random variables with density functions $$p(\mathbf{x})$$ and $$q(\mathbf{y})$$, the relative entropy is

$H(X,Y) = \int_{R^d} p(x) \log \frac{p(x)}{q(x)} dx$

More advance analysis can be access here

### 10.1.2 Divergence

Three metrics for divergence can be found here:

• Kullback-Leibler
• Jensen-Shannon
• Generalized Jensen-Shannon

### 10.1.3 Channel Capacity

The information channel capacity of a discrete memoryless channel is

$C = \max_{p(x)} I(X;Y)$

which is the highest rate use at which information can be sent without much error.