Chapter 4 Basic Statistical Inference


  • Make inferences (an interpretation) about the true parameter value \(\beta\) based on our estimator/estimate
  • Test whether our underlying assumptions (about the true population parameters, random variables, or model specification) hold true.

Testing does not

  • Confirm with 100% a hypothesis is true
  • Confirm with 100% a hypothesis is false
  • Tell you how to interpret the estimate value (Economic vs. Practical vs. Statistical Significance)

Hypothesis: Translate an objective in better understanding the results in terms of specifying a value (or sets of values) in which our population parameters should/should not lie.

  • Null hypothesis (\(H_0\)): A statement about the population parameter that we take to be true in which we would need the data to provide substantial evidence that against it.
    • Can be either a single value (ex: \(H_0: \beta=0\)) or a set of values (ex: \(H_0: \beta_1 \ge 0\))
    • Will generally be the value you would not like the population parameter to be (subjective)
      • \(H_0: \beta_1=0\) means you would like to see a non-zero coefficient
      • \(H_0: \beta_1 \ge 0\) means you would like to see a negative effect
    • “Test of Significance” refers to the two-sided test: \(H_0: \beta_j=0\)
  • Alternative hypothesis (\(H_a\) or \(H_1\)) (Research Hypothesis): All other possible values that the population parameter may be if the null hypothesis does not hold.

Type I Error

Error made when \(H_0\) is rejected when, in fact, \(H_0\) is true.
The probability of committing a Type I error is \(\alpha\) (known as level of significance of the test)

Type I error (\(\alpha\)): probability of rejecting \(H_0\) when it is true.

Legal analogy: In U.S. law, a defendant is presumed to be “innocent until proven guilty.”
If the null hypothesis is that a person is innocent, the Type I error is the probability that you conclude the person is guilty when he is innocent.


Type II Error

Type II error level (\(\beta\)): probability that you fail to reject the null hypothesis when it is false.

In the legal analogy, this is the probability that you fail to find the person guilty when he or she is guilty.

Error made when \(H_0\) is not rejected when, in fact, \(H_1\) is true
The probability of committing a Type II error is \(\beta\) (known as the power of the test)

Random sample of size n: A collection of n independent random variables taken from the distribution X, each with the same distribution as X.

Sample mean

\[ \bar{X}= (\sum_{i=1}^{n}X_i)/n \]

Sample Median

\(\tilde{x}\) = the middle observation in a sample of observation order from smallest to largest (or vice versa).

If n is odd, \(\tilde{x}\) is the middle observation,
If n is even, \(\tilde{x}\) is the average of the two middle observations.

Sample variance \[ S^2 = \frac{\sum_{i=1}^{n}(X_i = \bar{X})^2}{n-1}= \frac{n\sum_{i=1}^{n}X_i^2 -(\sum_{i=1}^{n}X_i)^2}{n(n-1)} \]

Sample standard deviation \[ S = \sqrt{S^2} \]

Sample proportions \[ \hat{p} = \frac{X}{n} = \frac{\text{number in the sample with trait}}{\text{sample size}} \]

\[ \widehat{p_1-p_2} = \hat{p_1}-\hat{p_2} = \frac{X_1}{n_1} - \frac{X_2}{n_2} = \frac{n_2X_1 = n_1X_2}{n_1n_2} \]

Estimators
Point Estimator
\(\hat{\theta}\) is a statistic used to approximate a population parameter \(\theta\)


Point estimate
The numerical value assumed by \(\hat{\theta}\) when evaluated for a given sample


Unbiased estimator
If \(E(\hat{\theta}) = \theta\), then \(\hat{\theta}\) is an unbiased estimator for \(\theta\)

  1. \(\bar{X}\) is an unbiased estimator for \(\mu\)
  2. \(S^2\) is an unbiased estimator for \(\sigma^2\)
  3. \(\hat{p}\) is an unbiased estimator for p
  4. \(\widehat{p_1-P_2}\) is an unbiased estimator for \(p_1- p_2\)
  5. \(\bar{X_1} - \bar{X_2}\) is an unbiased estimator for \(\mu_1 - \mu_2\)

Note: \(S\) is a biased estimator for \(\sigma\)

Distribution of the sample mean

If \(\bar{X}\) is the sample mean based on a random sample of size n drawn from a normal distribution X with mean \(\mu\) and standard deviation \(\sigma\), the \(\bar{X}\) is normally distributed, with mean \(\mu_{\bar{X}} = \mu\) and variance \(\sigma_{\bar{X}}^2 = Var(\bar{X}) = \frac{\sigma^2}{n}\). Then the standard error of the mean is: \(\sigma_{\bar{X}}= \frac{\sigma}{\sqrt{n}}\)