Chapter 4 Probability Distribution

4.1 Introduction to Probability

A probability model consists of a nonempty set called the sample space S; a collection of events that are subsets of S; and a probability measure P assigning a probability between 0 and 1 to each event, with $P(\varnothing)=0$ and $P(S)=1$ and with P additive

Sample space S: the set of possible outcomes.
- example: sample space in a single coin flip $S={H,T}$
Event E: a subset of the sample space.
- example: in a single coin flip H (coin lands head) is an event.
Probability : For each event E, $P(E)$ means the probability of event E occurring. The properties of $P(E)$ such as:
- $P(A)$ is always a nonnegative real number, between 0 and 1 inclusive. $0≤ P(E) ≤ 1$
- $P(\varnothing)=0$ , i.e., if E is the empty set , then $P(E) = 0$
- $P(S)=1$ , i.e., if E is the entire sample space S, then $P(E)=1$
- P is (countably) additive, meaning that if A1,A2, . . . is a finite or countable sequence of disjoint events, then $P(A1 \cup A2 \cup . . . ) = \sum_{i} P(A_i)$

4.2 Random Variables

A random variable X is a function from a sample space S to a real number. The distribution of random variable X is the collection of probabilities $P(X \in B)$ for X belonging to all subsets B.

4.2.1 Discrete Distribution

A discrete random variable X assumes values in discrete subset of $\mathbb{R}$

The distribution of discrete random variable X named Probability Mass Function $P(X=x) = p_x(x)$ , where $\sum_{x \in i} P(X=x) = 1$

4.2.2 Continues Distribution

A continuous random variable X assumes values in $\mathbb{R}$

The distribution of continues random variable X named Probability Density Function $f_x$ , where $P(X=x) = 0$ $\int_{-\infty}^{\infty} f(x) dx = 1$ $P(a≤X≤B) = \int_{a}^{b} f(x) dx$

4.3 Probability Distribution

4.3.1 Bernouli Distribution

The random variable X is said to have the Bernoulli distribution if a response variable takes only two possible values, with the probability of a values is p

$Y \sim Ber(p)$ If $X \in {0,1}$ and $p_x(1)=1-p_x(0)=p$

4.3.2 Binomial Distribution

The binomial distribution models the number of successes k in a fixed number of independent trials n, each with the same probability of success p

$Y \sim Bin(n,p)$ If $X \in {0,1,2, . . . ,n}$ and $p_x(k)= \binom{n}{k} p^k (1-p)^{(n-k)}$

In R, functions to work with the binomial distribution, such as:

dbinom(k, n, p) : calculates the probability mass function, probability of getting exactly k successes in size trials n with a success probability of p

## Probability of getting exactly 3 successes in 5 trials with a success probability of 0.5 : 
dbinom(3, size = 5, prob = 0.5)

## [1] 0.3125

pbinom(k, n, p) : calculates the cumulative distribution function, probability of getting up to k successes in size trials n with a success probability of p

## Probability of getting up to 3 successes in 5 trials with a success probability of 0.5 : 
pbinom(3, size = 5, prob = 0.5)

## [1] 0.8125

qbinom(prop, size, p) : finds the number of successes k such that the probability of getting that number or fewer successes is prob in size trials with a success probability of p

## Find the number of successes such that the probability of getting that number or fewer successes is 0.8 
qbinom(0.8, size = 5, prob = 0.5)

## [1] 3

rbinom(n, size, prob) : generates n random numbers following a binomial distribution with size trials n and a success probability of p

## Generate 10 random numbers following a binomial distribution with 5 trials and a success probability of 0.5 : 
rbinom(10, size = 5, prob = 0.5)

##  [1] 2 1 4 5 2 4 2 3 4 2

4.3.3 Poisson Distribution

The Poisson distribution models the probability of a certain number of events occurring within a fixed interval of time or space, given a known average rate of occurrence.

$Y \sim Pois(\lambda)$ If $X \in {0,1,2, . . . }$ and $p_x(k)= \frac{\lambda^k e^{-\lambda}}{k!}$

In R, functions to work with the poisson distribution, such as:

dpois(k, lambda) : calculates the probability of observing x events in a given interval with an average rate of occurrence lambda.

## Probability of observing exactly 3 events in an interval with an average rate of occurrence of 2 : 
dpois(3, lambda = 2)

## [1] 0.180447

ppois(x, lambda) : calculates the cumulative distribution function, probability of observing up to x events in a given interval with an average rate of occurrence lamnda.

## Probability of observing up to 3 events in an interval with an average rate of occurrence of 2 : 
ppois(3, lambda = 2)

## [1] 0.8571235

qpois(p, lambda) : the number of events such that the probability of observing that number or fewer events is p in a given interval with an average rate of occurrence lambda.

## Find the number of events such that the probability of observing that number or fewer events is 0.8 : 
qpois(0.8, lambda = 2)

## [1] 3

rpois(n, lambda) : generates n random numbers following a Poisson distribution with an average rate of occurrence lambda

## Generate 10 random numbers following a Poisson distribution with an average rate of occurrence of 2 : 
rpois(10, lambda = 2)

##  [1] 1 1 1 3 0 3 1 3 4 3

4.3.4 Normal Distribution

The normal distribution, also known as the Gaussian distribution, is a probability distribution that is symmetric, bell-shaped, and characterized by its mean and standard deviation, with the majority of observations clustered around the mean.

$Y \sim Normal(\mu, {\sigma}^2)$ If $-\infty<X<\infty$ and $f_x= \frac{1}{\sqrt{2\pi\sigma}} exp{-\frac{(x-\mu)^2}{2\sigma^2}}$

In R, functions to work with the poisson distribution, such as:

dnorm(x, mean, sd) : calculates the probability density at a given point x for a normal distribution with mean mean and standard deviation sd.

# Probability density at x = 0 for a normal distribution with mean 0 and standard deviation 1
dnorm(0, mean = 0, sd = 1)

## [1] 0.3989423

pnorm(x, mean, sd) : calculates the cumulative distribution function, probability of observing a value less than or equal to x in a normal distribution with mean mean and standard deviation sd.

# Cumulative probability up to x = 1 for a normal distribution with mean 0 and standard deviation 1
pnorm(1, mean = 0, sd = 1)

## [1] 0.8413447

qnorm(p, mean, sd) : finds the value such that the probability of observing a value less than or equal to that value

# Find the value such that the cumulative probability is 0.8 for a normal distribution with mean 0 and standard deviation 1
qnorm(0.8, mean = 0, sd = 1)

## [1] 0.8416212

rnorm(n, mean, sd) : generates n random numbers following a normal distribution with mean mean and standard deviation sd.

# Generate 10 random numbers following a normal distribution with mean 0 and standard deviation 1
rnorm(10, mean = 0, sd = 1)

##  [1] -1.4662273  0.3501215  0.9431867  0.4434739  0.8454432  0.2389213
##  [7] -1.1977786  0.4258555  0.3626680  0.1183605

4.4 Probability Distribution Summary

Distribution	Type	PMF/PDF	E(X)	Var(X)	R Function
Bernoulli	D	$p_x(k)= p^k (1-p)^{(1-k)}$ $X \in {0,1}$	$p$	$p(1-p)$	-
Binomial	D	$p_x(k)= \binom{n}{k} p^k (1-p)^{(n-k)}$ $X \in {0,1,2, . . . ,n}$	$np$	$np(1-p)$	`dbinom`, `pbinom`, `qbinom`, `rbinom`
Poisson	D	$p_x(k)= \frac{\lambda^k e^{-\lambda}}{k!}$ $X \in {0,1,2, . . . }$	$\lambda$	$\lambda$	`dpois`, `ppois`, `qpois`, `rpois`
Uniform	C	$f_x= \frac{1}{b-a}$ for $a<=x<=b$	$\frac{a+b}{2}$	$\frac{(a-b)^2}{12}$	`dunif`, `punif`, `qunif`, `runif`
Normal	C	$f_x= \frac{1}{\sqrt{2\pi\sigma}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}$ for $-\infty<X<\infty$	$\mu$	$\sigma^2$	`dnorm`, `pnorm`, `qnorm`, `rnorm`
Exponential	C	$f_x=\lambda e^{(-\lambda x)}$ for $x>0$	$\frac{1}{\lambda}$	$\frac{1}{\lambda^2}$	`dexp`, `pexp`, `qexp`, `rexp`
Geometry	D	$p_x(k)= (1-p)^{(k-1)}p$ $0<X<=1$	$\frac{1}{p}$	$\frac{1-p}{p^2}$	`dgeom`, `pgeom`, `qgeom`, `rgeom`
Beta	D	$f_x=\frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)} x^{\alpha-1}(1-x)^{\beta-1}$ for $0<=x<=1$	$\frac{\alpha}{\alpha+\beta}$	$\frac{\alpha\beta}{(\alpha+\beta)^2 (\alpha+\beta+1)}$	`dbeta`, `pbeta`, `qbeta`, `rbeta`