4.2 Key Concepts and Definitions
4.2.1 Random Sample
A random sample of size \(n\) consists of \(n\) independent observations, each drawn from the same underlying population distribution. Independence ensures that no observation influences another, and identical distribution guarantees that all observations are governed by the same probability rules.
4.2.2 Sample Statistics
4.2.2.1 Sample Mean
The sample mean is a measure of central tendency:
\[ \bar{X} = \frac{\sum_{i=1}^{n} X_i}{n} \]
- Example: Suppose we measure the heights of 5 individuals (in cm): \(170, 165, 180, 175, 172\). The sample mean is:
\[ \bar{X} = \frac{170 + 165 + 180 + 175 + 172}{5} = 172.4 \, \text{cm}. \]
4.2.2.2 Sample Median
The sample median is the middle value of ordered data:
\[ \tilde{x} = \begin{cases} \text{Middle observation,} & \text{if } n \text{ is odd}, \\ \text{Average of two middle observations,} & \text{if } n \text{ is even}. \end{cases} \]
4.2.2.3 Sample Variance
The sample variance measures data spread:
\[ S^2 = \frac{\sum_{i=1}^{n}(X_i - \bar{X})^2}{n-1} \]
4.2.2.4 Sample Standard Deviation
The sample standard deviation is the square root of the variance:
\[ S = \sqrt{S^2} \]
4.2.2.5 Sample Proportions
Used for categorical data:
\[ \hat{p} = \frac{X}{n} = \frac{\text{Number of successes}}{\text{Sample size}} \]
4.2.2.6 Estimators
- Point Estimator: A statistic (\(\hat{\theta}\)) used to estimate a population parameter (\(\theta\)).
- Point Estimate:The numerical value assumed by \(\hat{\theta}\) when evaluated for a given sample.
- Unbiased Estimator: A point estimator \(\hat{\theta}\) is unbiased if \(E(\hat{\theta}) = \theta\).
Examples of unbiased estimators:
\(\bar{X}\) for \(\mu\) (population mean).
\(S^2\) for \(\sigma^2\) (population variance).
\(\hat{p}\) for \(p\) (population proportion).
\(\widehat{p_1-p_2}\) for \(p_1- p_2\) (population proportion difference)
\(\bar{X_1} - \bar{X_2}\) for \(\mu_1 - \mu_2\) (population mean difference)
Note: While \(S^2\) is unbiased for \(\sigma^2\), \(S\) is a biased estimator of \(\sigma\).
4.2.3 Distribution of the Sample Mean
The sampling distribution of the mean \(\bar{X}\) depends on:
- Population Distribution:
- If \(X \sim N(\mu, \sigma^2)\), then \(\bar{X} \sim N\left(\mu, \frac{\sigma^2}{n}\right)\).
- Central Limit Theorem:
- For large \(n\), \(\bar{X}\) approximately follows a normal distribution, regardless of the population’s shape.
4.2.3.1 Standard Error of the Mean
The standard error quantifies variability in \(\bar{X}\):
\[ \sigma_{\bar{X}} = \frac{\sigma}{\sqrt{n}} \]
Example: - Suppose \(\sigma = 10\) and \(n = 25\). Then: \[ \sigma_{\bar{X}} = \frac{10}{\sqrt{25}} = 2. \]
The smaller the standard error, the more precise our estimate of the population mean.