Chapter 4 Probability Distribution
4.1 Introduction to Probability
A probability model consists of a nonempty set called the sample space S
; a collection of events that are subsets of S
; and a probability measure P
assigning a probability between 0 and 1 to each event, with \(P(\varnothing)=0\) and \(P(S)=1\) and with P
additive
- Sample space
S
: the set of possible outcomes.- example: sample space in a single coin flip \(S={H,T}\)
- Event
E
: a subset of the sample space.- example: in a single coin flip H (coin lands head) is an event.
- Probability : For each event
E
, \(P(E)\) means the probability of eventE
occurring. The properties of \(P(E)\) such as:- \(P(A)\) is always a nonnegative real number, between 0 and 1 inclusive. \(0≤ P(E) ≤ 1\)
- \(P(\varnothing)=0\), i.e., if
E
is the empty set , then \(P(E) = 0\) - \(P(S)=1\), i.e., if
E
is the entire sample spaceS
, then \(P(E)=1\) P
is (countably) additive, meaning that ifA1,A2, . . .
is a finite or countable sequence of disjoint events, then \(P(A1 \cup A2 \cup . . . ) = \sum_{i} P(A_i)\)
4.2 Random Variables
A random variable X
is a function from a sample space S
to a real number. The distribution of random variable X is the collection of probabilities \(P(X \in B)\) for X
belonging to all subsets B
.
4.3 Probability Distribution
4.3.1 Bernouli Distribution
The random variable X
is said to have the Bernoulli distribution if a response variable takes only two possible values, with the probability of a values is p
\[Y \sim Ber(p)\] If \(X \in {0,1}\) and \(p_x(1)=1-p_x(0)=p\)
4.3.2 Binomial Distribution
The binomial distribution models the number of successes k
in a fixed number of independent trials n
, each with the same probability of success p
\[Y \sim Bin(n,p)\] If \(X \in {0,1,2, . . . ,n}\) and \(p_x(k)= \binom{n}{k} p^k (1-p)^{(n-k)}\)
In R, functions to work with the binomial distribution, such as:
dbinom(k, n, p)
: calculates the probability mass function, probability of getting exactlyk
successes in size trialsn
with a success probability ofp
## Probability of getting exactly 3 successes in 5 trials with a success probability of 0.5 :
dbinom(3, size = 5, prob = 0.5)
## [1] 0.3125
pbinom(k, n, p)
: calculates the cumulative distribution function, probability of getting up tok
successes in size trialsn
with a success probability ofp
## Probability of getting up to 3 successes in 5 trials with a success probability of 0.5 :
pbinom(3, size = 5, prob = 0.5)
## [1] 0.8125
qbinom(prop, size, p)
: finds the number of successesk
such that the probability of getting that number or fewer successes is prob in size trials with a success probability ofp
## Find the number of successes such that the probability of getting that number or fewer successes is 0.8
qbinom(0.8, size = 5, prob = 0.5)
## [1] 3
rbinom(n, size, prob)
: generates n random numbers following a binomial distribution with size trialsn
and a success probability ofp
## Generate 10 random numbers following a binomial distribution with 5 trials and a success probability of 0.5 :
rbinom(10, size = 5, prob = 0.5)
## [1] 2 1 4 5 2 4 2 3 4 2
4.3.3 Poisson Distribution
The Poisson distribution models the probability of a certain number of events occurring within a fixed interval of time or space, given a known average rate of occurrence.
\[Y \sim Pois(\lambda)\] If \(X \in {0,1,2, . . . }\) and \(p_x(k)= \frac{\lambda^k e^{-\lambda}}{k!}\)
In R, functions to work with the poisson distribution, such as:
dpois(k, lambda)
: calculates the probability of observingx
events in a given interval with an average rate of occurrence lambda.
## Probability of observing exactly 3 events in an interval with an average rate of occurrence of 2 :
dpois(3, lambda = 2)
## [1] 0.180447
ppois(x, lambda)
: calculates the cumulative distribution function, probability of observing up tox
events in a given interval with an average rate of occurrence lamnda.
## Probability of observing up to 3 events in an interval with an average rate of occurrence of 2 :
ppois(3, lambda = 2)
## [1] 0.8571235
qpois(p, lambda)
: the number of events such that the probability of observing that number or fewer events isp
in a given interval with an average rate of occurrencelambda
.
## Find the number of events such that the probability of observing that number or fewer events is 0.8 :
qpois(0.8, lambda = 2)
## [1] 3
rpois(n, lambda)
: generatesn
random numbers following a Poisson distribution with an average rate of occurrencelambda
## Generate 10 random numbers following a Poisson distribution with an average rate of occurrence of 2 :
rpois(10, lambda = 2)
## [1] 1 1 1 3 0 3 1 3 4 3
4.3.4 Normal Distribution
The normal distribution, also known as the Gaussian distribution, is a probability distribution that is symmetric, bell-shaped, and characterized by its mean and standard deviation, with the majority of observations clustered around the mean.
\[Y \sim Normal(\mu, {\sigma}^2)\] If \(-\infty<X<\infty\) and \(f_x= \frac{1}{\sqrt{2\pi\sigma}} exp{-\frac{(x-\mu)^2}{2\sigma^2}}\)
In R, functions to work with the poisson distribution, such as:
dnorm(x, mean, sd)
: calculates the probability density at a given point x for a normal distribution with mean mean and standard deviation sd.
# Probability density at x = 0 for a normal distribution with mean 0 and standard deviation 1
dnorm(0, mean = 0, sd = 1)
## [1] 0.3989423
pnorm(x, mean, sd)
: calculates the cumulative distribution function, probability of observing a value less than or equal to x in a normal distribution with mean mean and standard deviation sd.
# Cumulative probability up to x = 1 for a normal distribution with mean 0 and standard deviation 1
pnorm(1, mean = 0, sd = 1)
## [1] 0.8413447
qnorm(p, mean, sd)
: finds the value such that the probability of observing a value less than or equal to that value
# Find the value such that the cumulative probability is 0.8 for a normal distribution with mean 0 and standard deviation 1
qnorm(0.8, mean = 0, sd = 1)
## [1] 0.8416212
rnorm(n, mean, sd)
: generates n random numbers following a normal distribution with mean mean and standard deviation sd.
# Generate 10 random numbers following a normal distribution with mean 0 and standard deviation 1
rnorm(10, mean = 0, sd = 1)
## [1] -1.4662273 0.3501215 0.9431867 0.4434739 0.8454432 0.2389213
## [7] -1.1977786 0.4258555 0.3626680 0.1183605
4.4 Probability Distribution Summary
Distribution | Type | PMF/PDF | E(X) | Var(X) | R Function |
---|---|---|---|---|---|
Bernoulli | D | \(p_x(k)= p^k (1-p)^{(1-k)}\) \(X \in {0,1}\) | \(p\) | \(p(1-p)\) | - |
Binomial | D | \(p_x(k)= \binom{n}{k} p^k (1-p)^{(n-k)}\) \(X \in {0,1,2, . . . ,n}\) | \(np\) | \(np(1-p)\) | dbinom , pbinom , qbinom , rbinom |
Poisson | D | \(p_x(k)= \frac{\lambda^k e^{-\lambda}}{k!}\) \(X \in {0,1,2, . . . }\) | \(\lambda\) | \(\lambda\) | dpois , ppois , qpois , rpois |
Uniform | C | \(f_x= \frac{1}{b-a}\) for \(a<=x<=b\) | \(\frac{a+b}{2}\) | \(\frac{(a-b)^2}{12}\) | dunif , punif , qunif , runif |
Normal | C | \(f_x= \frac{1}{\sqrt{2\pi\sigma}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}\) for \(-\infty<X<\infty\) | \(\mu\) | \(\sigma^2\) | dnorm , pnorm , qnorm , rnorm |
Exponential | C | \(f_x=\lambda e^{(-\lambda x)}\) for \(x>0\) | \(\frac{1}{\lambda}\) | \(\frac{1}{\lambda^2}\) | dexp , pexp , qexp , rexp |
Geometry | D | \(p_x(k)= (1-p)^{(k-1)}p\) \(0<X<=1\) | \(\frac{1}{p}\) | \(\frac{1-p}{p^2}\) | dgeom , pgeom , qgeom , rgeom |
Beta | D | \(f_x=\frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)} x^{\alpha-1}(1-x)^{\beta-1}\) for \(0<=x<=1\) | \(\frac{\alpha}{\alpha+\beta}\) | \(\frac{\alpha\beta}{(\alpha+\beta)^2 (\alpha+\beta+1)}\) | dbeta , pbeta , qbeta , rbeta |