Formulary

Statistical Distributions

Normal (Gaussian) Distribution

  • Formula: f(x)=12πσ2e(xμ)22σ2 f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x - \mu)^2}{2\sigma^2}}
  • Description: The normal distribution is a continuous probability distribution characterized by a bell-shaped curve. It is defined by the mean (μ\mu) and standard deviation (σ\sigma).

Binomial Distribution

  • Formula: P(X=k)=(nk)pk(1p)nk P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}
  • Description: The binomial distribution represents the number of successes in a fixed number of independent Bernoulli trials, with a constant probability of success pp in each trial. Here, nn is the number of trials and kk is the number of successes.

Poisson Distribution

  • Formula: P(X=k)=λkeλk! P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}
  • Description: The Poisson distribution represents the probability of a given number of events occurring in a fixed interval of time or space, given the average number of times the event occurs over that interval. Here, λ\lambda is the average number of events, kk is the number of occurrences, and ee is Euler’s number.

Exponential Distribution

  • Formula: f(x)=λeλxfor x0 f(x) = \lambda e^{-\lambda x} \quad \text{for } x \ge 0
  • Description: The exponential distribution represents the time between events in a Poisson process. It is defined by the rate parameter λ\lambda.

Uniform Distribution

  • Formula: f(x)={1baaxb0otherwise f(x) = \begin{cases} \frac{1}{b - a} & a \le x \le b \\ 0 & \text{otherwise} \end{cases}
  • Description: The uniform distribution describes an equal probability for all values in the interval [a,b][a, b]. It is a continuous distribution.

Bernoulli Distribution

  • Formula: P(X=x)=px(1p)1xfor x{0,1} P(X = x) = p^x (1 - p)^{1-x} \quad \text{for } x \in \{0, 1\}
  • Description: The Bernoulli distribution is a discrete distribution representing the outcome of a single binary experiment with success probability pp.

Beta Distribution

  • Formula: f(x)=xα1(1x)β1B(α,β)for 0x1 f(x) = \frac{x^{\alpha-1} (1-x)^{\beta-1}}{B(\alpha, \beta)} \quad \text{for } 0 \le x \le 1
  • Description: The beta distribution is a continuous distribution defined on the interval [0, 1], parameterized by α\alpha and β\beta, and is useful in Bayesian statistics.

Gamma Distribution

  • Formula: f(x)=βαxα1eβxΓ(α)for x0 f(x) = \frac{\beta^\alpha x^{\alpha-1} e^{-\beta x}}{\Gamma(\alpha)} \quad \text{for } x \ge 0
  • Description: The gamma distribution is a continuous distribution defined by shape parameter α\alpha and rate parameter β\beta. It generalizes the exponential distribution.

Chi-Squared Distribution

  • Formula: f(x)=12k/2Γ(k/2)xk/21ex/2for x0 f(x) = \frac{1}{2^{k/2} \Gamma(k/2)} x^{k/2 - 1} e^{-x/2} \quad \text{for } x \ge 0
  • Description: The chi-squared distribution is a special case of the gamma distribution with α=k/2\alpha = k/2 and β=1/2\beta = 1/2, often used in hypothesis testing and confidence intervals.

Student’s t-Distribution

  • Formula: f(t)=Γ(ν+12)νπΓ(ν2)(1+t2ν)ν+12 f(t) = \frac{\Gamma\left(\frac{\nu+1}{2}\right)}{\sqrt{\nu\pi} \Gamma\left(\frac{\nu}{2}\right)} \left(1 + \frac{t^2}{\nu}\right)^{-\frac{\nu+1}{2}}
  • Description: The t-distribution is used to estimate population parameters when the sample size is small and the population variance is unknown. It is defined by the degrees of freedom ν\nu.

F-Distribution

  • Formula: f(x)=(d1d2)d1/2xd1/21B(d12,d22)(1+d1d2x)(d1+d2)/2 f(x) = \frac{\left(\frac{d_1}{d_2}\right)^{d_1/2} x^{d_1/2 - 1}}{B\left(\frac{d_1}{2}, \frac{d_2}{2}\right) \left(1 + \frac{d_1}{d_2} x\right)^{(d_1 + d_2)/2}}
  • Description: The F-distribution is used to compare two variances and is defined by two degrees of freedom, d1d_1 and d2d_2.

Multinomial Distribution

  • Formula: P(X1=x1,,Xk=xk)=n!x1!xk!p1x1pkxk P(X_1 = x_1, \ldots, X_k = x_k) = \frac{n!}{x_1! \cdots x_k!} p_1^{x_1} \cdots p_k^{x_k}
  • Description: The multinomial distribution generalizes the binomial distribution to more than two outcomes. It describes the probabilities of counts among categories.

Geometric Distribution

  • Formula: P(X=k)=(1p)k1pfor k{1,2,3,} P(X = k) = (1 - p)^{k-1} p \quad \text{for } k \in \{1, 2, 3, \ldots\}
  • Description: The geometric distribution represents the number of trials needed to get the first success in a sequence of independent Bernoulli trials with success probability pp.

Hypergeometric Distribution

  • Formula: P(X=k)=(Kk)(NKnk)(Nn) P(X = k) = \frac{\binom{K}{k} \binom{N-K}{n-k}}{\binom{N}{n}}
  • Description: The hypergeometric distribution describes the probability of kk successes in nn draws from a finite population of size NN containing KK successes, without replacement.

Log-Normal Distribution

  • Formula: f(x)=1xσ2πe(lnxμ)22σ2for x>0 f(x) = \frac{1}{x\sigma\sqrt{2\pi}} e^{-\frac{(\ln x - \mu)^2}{2\sigma^2}} \quad \text{for } x > 0
  • Description: The log-normal distribution describes a variable whose logarithm is normally distributed. It is useful in modeling positively skewed data.

Machine Learning Models

Linear Regression

  • Formula: y=β0+β1x1+β2x2++βnxn+ϵ y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_n x_n + \epsilon
  • Description: Predicts a continuous target variable based on linear relationships between the target and one or more predictor variables.

Logistic Regression

  • Formula: logit(P(Y=1))=ln(P(Y=1)1P(Y=1))=β0+β1x1+β2x2++βnxn \text{logit}(P(Y=1)) = \ln\left(\frac{P(Y=1)}{1 - P(Y=1)}\right) = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_n x_n
  • Description: Predicts a binary outcome based on linear relationships between the predictor variables and the log-odds of the outcome.

Generalized Linear Model (GLM)

  • Formula: g(E(Y))=β0+β1x1+β2x2++βnxn g(E(Y)) = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_n x_n
  • Description: A generalized linear model is a flexible generalization of ordinary linear regression that allows for the dependent variable YY to have a distribution other than normal. The link function gg relates the expected value of the response variable E(Y)E(Y) to the linear predictors. β0\beta_0 is the intercept, and βi\beta_i are the coefficients for the predictor variables xix_i.

Generalized Additive Model (GAM)

  • Formula: g(E(Y))=β0+f1(x1)+f2(x2)++fn(xn) g(E(Y)) = \beta_0 + f_1(x_1) + f_2(x_2) + \cdots + f_n(x_n)
  • Description: A generalized additive model is an extension of generalized linear models where the linear predictor depends linearly on unknown smooth functions of some predictor variables, and it allows for non-linear relationships between the dependent and independent variables. Here, gg is the link function, E(Y)E(Y) is the expected value of the response variable YY, β0\beta_0 is the intercept

Decision Tree

  • Formula: Recursive binary splitting
  • Description: Splits the data into subsets based on the value of input features. Each internal node represents a “test” on an attribute, each branch represents the outcome of the test, and each leaf node represents a class label or continuous value.

Random Forest

  • Formula: Aggregated decision trees
  • Description: Combines the predictions of multiple decision trees to improve accuracy and control over-fitting. Each tree is trained on a bootstrapped sample of the data and uses a random subset of features.

Support Vector Machine (SVM)

  • Formula: f(x)=sign(wx+b) f(x) = \text{sign}(\mathbf{w} \cdot \mathbf{x} + b)
  • Description: Finds the hyperplane that best separates the classes in the feature space. The formula represents the decision boundary, where w\mathbf{w} is the weight vector and bb is the bias.

K-Nearest Neighbors (KNN)

  • Formula: y^=1ki=1kyi \hat{y} = \frac{1}{k} \sum_{i=1}^{k} y_i
  • Description: Classifies a data point based on the majority class among its kk nearest neighbors. For regression, it predicts the average of the kk nearest neighbors’ values.

Naive Bayes

  • Formula: P(YX)=P(XY)P(Y)P(X) P(Y|X) = \frac{P(X|Y)P(Y)}{P(X)}
  • Description: Assumes independence between predictors. It uses Bayes’ theorem to predict the probability of a class given the predictors.

Principal Component Analysis (PCA)

  • Formula: Z=XW Z = XW
  • Description: Reduces the dimensionality of the data by transforming the original variables into new uncorrelated variables (principal components), ordered by the amount of variance they capture.

K-Means Clustering

  • Formula: argminSi=1kxSixμi2 \arg \min_S \sum_{i=1}^{k} \sum_{x \in S_i} \| x - \mu_i \|^2
  • Description: Partitions the data into kk clusters by minimizing the sum of squared distances between the data points and the cluster centroids μi\mu_i.

Neural Networks

  • Formula: a(l)=σ(z(l)) a^{(l)} = \sigma(z^{(l)}) z(l)=W(l)a(l1)+b(l) z^{(l)} = W^{(l)}a^{(l-1)} + b^{(l)}
  • Description: Composed of layers of interconnected nodes (neurons). Each neuron’s output is a weighted sum of its inputs passed through an activation function σ\sigma. The parameters W(l)W^{(l)} and b(l)b^{(l)} are the weights and biases of layer ll.

Convolutional Neural Networks (CNN)

  • Formula: (fg)(t)=f(τ)g(tτ)dτ (f * g)(t) = \int_{-\infty}^{\infty} f(\tau)g(t - \tau) \, d\tau
  • Description: Uses convolutional layers to apply filters to the input, which helps in capturing spatial hierarchies in data, particularly useful for image and video processing.

Recurrent Neural Networks (RNN)

  • Formula: ht=σ(Whht1+Wxxt+b) h_t = \sigma(W_h h_{t-1} + W_x x_t + b)
  • Description: Designed to recognize patterns in sequences of data by maintaining a hidden state hth_t that captures information from previous time steps.

Gradient Boosting Machines (GBM)

  • Formula: Fm(x)=Fm1(x)+ηhm(x) F_m(x) = F_{m-1}(x) + \eta \cdot h_m(x)
  • Description: Builds an additive model in a forward stage-wise manner. Each base learner hmh_m is trained to reduce the residual error of the ensemble’s previous predictions.

Long Short-Term Memory Networks (LSTM)

  • Formula: ft=σ(Wf[ht1,xt]+bf)it=σ(Wi[ht1,xt]+bi)C~t=tanh(WC[ht1,xt]+bC)Ct=ftCt1+itC~tot=σ(Wo[ht1,xt]+bo)ht=ottanh(Ct) \begin{aligned} f_t &= \sigma(W_f \cdot [h_{t-1}, x_t] + b_f) \\ i_t &= \sigma(W_i \cdot [h_{t-1}, x_t] + b_i) \\ \tilde{C}_t &= \tanh(W_C \cdot [h_{t-1}, x_t] + b_C) \\ C_t &= f_t * C_{t-1} + i_t * \tilde{C}_t \\ o_t &= \sigma(W_o \cdot [h_{t-1}, x_t] + b_o) \\ h_t &= o_t * \tanh(C_t) \end{aligned}
  • Description: A type of RNN that can learn long-term dependencies by using gates to control the flow of information.