Description: The normal distribution is a continuous probability distribution characterized by a bell-shaped curve. It is defined by the mean (μ) and standard deviation (σ).
Binomial Distribution
Formula: P(X=k)=(kn)pk(1−p)n−k
Description: The binomial distribution represents the number of successes in a fixed number of independent Bernoulli trials, with a constant probability of success p in each trial. Here, n is the number of trials and k is the number of successes.
Poisson Distribution
Formula: P(X=k)=k!λke−λ
Description: The Poisson distribution represents the probability of a given number of events occurring in a fixed interval of time or space, given the average number of times the event occurs over that interval. Here, λ is the average number of events, k is the number of occurrences, and e is Euler’s number.
Exponential Distribution
Formula: f(x)=λe−λxfor x≥0
Description: The exponential distribution represents the time between events in a Poisson process. It is defined by the rate parameter λ.
Uniform Distribution
Formula: f(x)={b−a10a≤x≤botherwise
Description: The uniform distribution describes an equal probability for all values in the interval [a,b]. It is a continuous distribution.
Bernoulli Distribution
Formula: P(X=x)=px(1−p)1−xfor x∈{0,1}
Description: The Bernoulli distribution is a discrete distribution representing the outcome of a single binary experiment with success probability p.
Beta Distribution
Formula: f(x)=B(α,β)xα−1(1−x)β−1for 0≤x≤1
Description: The beta distribution is a continuous distribution defined on the interval [0, 1], parameterized by α and β, and is useful in Bayesian statistics.
Gamma Distribution
Formula: f(x)=Γ(α)βαxα−1e−βxfor x≥0
Description: The gamma distribution is a continuous distribution defined by shape parameter α and rate parameter β. It generalizes the exponential distribution.
Chi-Squared Distribution
Formula: f(x)=2k/2Γ(k/2)1xk/2−1e−x/2for x≥0
Description: The chi-squared distribution is a special case of the gamma distribution with α=k/2 and β=1/2, often used in hypothesis testing and confidence intervals.
Student’s t-Distribution
Formula: f(t)=νπΓ(2ν)Γ(2ν+1)(1+νt2)−2ν+1
Description: The t-distribution is used to estimate population parameters when the sample size is small and the population variance is unknown. It is defined by the degrees of freedom ν.
Description: The multinomial distribution generalizes the binomial distribution to more than two outcomes. It describes the probabilities of counts among categories.
Geometric Distribution
Formula: P(X=k)=(1−p)k−1pfor k∈{1,2,3,…}
Description: The geometric distribution represents the number of trials needed to get the first success in a sequence of independent Bernoulli trials with success probability p.
Hypergeometric Distribution
Formula: P(X=k)=(nN)(kK)(n−kN−K)
Description: The hypergeometric distribution describes the probability of k successes in n draws from a finite population of size N containing K successes, without replacement.
Log-Normal Distribution
Formula: f(x)=xσ2π1e−2σ2(lnx−μ)2for x>0
Description: The log-normal distribution describes a variable whose logarithm is normally distributed. It is useful in modeling positively skewed data.
Machine Learning Models
Linear Regression
Formula: y=β0+β1x1+β2x2+⋯+βnxn+ϵ
Description: Predicts a continuous target variable based on linear relationships between the target and one or more predictor variables.
Description: Predicts a binary outcome based on linear relationships between the predictor variables and the log-odds of the outcome.
Generalized Linear Model (GLM)
Formula: g(E(Y))=β0+β1x1+β2x2+⋯+βnxn
Description: A generalized linear model is a flexible generalization of ordinary linear regression that allows for the dependent variable Y to have a distribution other than normal. The link function g relates the expected value of the response variable E(Y) to the linear predictors. β0 is the intercept, and βi are the coefficients for the predictor variables xi.
Generalized Additive Model (GAM)
Formula: g(E(Y))=β0+f1(x1)+f2(x2)+⋯+fn(xn)
Description: A generalized additive model is an extension of generalized linear models where the linear predictor depends linearly on unknown smooth functions of some predictor variables, and it allows for non-linear relationships between the dependent and independent variables. Here, g is the link function, E(Y) is the expected value of the response variable Y, β0 is the intercept
Decision Tree
Formula: Recursive binary splitting
Description: Splits the data into subsets based on the value of input features. Each internal node represents a “test” on an attribute, each branch represents the outcome of the test, and each leaf node represents a class label or continuous value.
Random Forest
Formula: Aggregated decision trees
Description: Combines the predictions of multiple decision trees to improve accuracy and control over-fitting. Each tree is trained on a bootstrapped sample of the data and uses a random subset of features.
Support Vector Machine (SVM)
Formula: f(x)=sign(w⋅x+b)
Description: Finds the hyperplane that best separates the classes in the feature space. The formula represents the decision boundary, where w is the weight vector and b is the bias.
K-Nearest Neighbors (KNN)
Formula: y^=k1i=1∑kyi
Description: Classifies a data point based on the majority class among its k nearest neighbors. For regression, it predicts the average of the k nearest neighbors’ values.
Naive Bayes
Formula: P(Y∣X)=P(X)P(X∣Y)P(Y)
Description: Assumes independence between predictors. It uses Bayes’ theorem to predict the probability of a class given the predictors.
Principal Component Analysis (PCA)
Formula: Z=XW
Description: Reduces the dimensionality of the data by transforming the original variables into new uncorrelated variables (principal components), ordered by the amount of variance they capture.
K-Means Clustering
Formula: argSmini=1∑kx∈Si∑∥x−μi∥2
Description: Partitions the data into k clusters by minimizing the sum of squared distances between the data points and the cluster centroids μi.
Neural Networks
Formula: a(l)=σ(z(l))z(l)=W(l)a(l−1)+b(l)
Description: Composed of layers of interconnected nodes (neurons). Each neuron’s output is a weighted sum of its inputs passed through an activation function σ. The parameters W(l) and b(l) are the weights and biases of layer l.
Convolutional Neural Networks (CNN)
Formula: (f∗g)(t)=∫−∞∞f(τ)g(t−τ)dτ
Description: Uses convolutional layers to apply filters to the input, which helps in capturing spatial hierarchies in data, particularly useful for image and video processing.
Recurrent Neural Networks (RNN)
Formula: ht=σ(Whht−1+Wxxt+b)
Description: Designed to recognize patterns in sequences of data by maintaining a hidden state ht that captures information from previous time steps.
Gradient Boosting Machines (GBM)
Formula: Fm(x)=Fm−1(x)+η⋅hm(x)
Description: Builds an additive model in a forward stage-wise manner. Each base learner hm is trained to reduce the residual error of the ensemble’s previous predictions.