4 Basic Statistical Inference
 One Sample Inference
 Two Sample Inference
 Categorical Data Analysis
 Divergence Metrics and Test for Comparing Distributions
 Make inferences (an interpretation) about the true parameter value \(\beta\) based on our estimator/estimate
 Test whether our underlying assumptions (about the true population parameters, random variables, or model specification) hold true.
Testing does not
 Confirm with 100% a hypothesis is true
 Confirm with 100% a hypothesis is false
 Tell you how to interpret the estimate value (Economic vs. Practical vs. Statistical Significance)
Hypothesis: Translate an objective in better understanding the results in terms of specifying a value (or sets of values) in which our population parameters should/should not lie.

Null hypothesis (\(H_0\)): A statement about the population parameter that we take to be true in which we would need the data to provide substantial evidence that against it.
 Can be either a single value (ex: \(H_0: \beta=0\)) or a set of values (ex: \(H_0: \beta_1 \ge 0\))
 Will generally be the value you would not like the population parameter to be (subjective)
 \(H_0: \beta_1=0\) means you would like to see a nonzero coefficient
 \(H_0: \beta_1 \ge 0\) means you would like to see a negative effect
 “Test of Significance” refers to the twosided test: \(H_0: \beta_j=0\)
 Alternative hypothesis (\(H_a\) or \(H_1\)) (Research Hypothesis): All other possible values that the population parameter may be if the null hypothesis does not hold.
Type I Error
Error made when \(H_0\) is rejected when, in fact, \(H_0\) is true.
The probability of committing a Type I error is \(\alpha\) (known as level of significance of the test)
Type I error (\(\alpha\)): probability of rejecting \(H_0\) when it is true.
Legal analogy: In U.S. law, a defendant is presumed to be “innocent until proven guilty”.
If the null hypothesis is that a person is innocent, the Type I error is the probability that you conclude the person is guilty when he is innocent.
Type II Error
Type II error level (\(\beta\)): probability that you fail to reject the null hypothesis when it is false.
In the legal analogy, this is the probability that you fail to find the person guilty when he or she is guilty.
Error made when \(H_0\) is not rejected when, in fact, \(H_1\) is true
The probability of committing a Type II error is \(\beta\) (known as the power of the test)
Random sample of size n: A collection of n independent random variables taken from the distribution X, each with the same distribution as X.
Sample mean
\[ \bar{X}= (\sum_{i=1}^{n}X_i)/n \]
Sample Median
\(\tilde{x}\) = the middle observation in a sample of observation order from smallest to largest (or vice versa).
If n is odd, \(\tilde{x}\) is the middle observation,
If n is even, \(\tilde{x}\) is the average of the two middle observations.
Sample variance \[ S^2 = \frac{\sum_{i=1}^{n}(X_i = \bar{X})^2}{n1}= \frac{n\sum_{i=1}^{n}X_i^2 (\sum_{i=1}^{n}X_i)^2}{n(n1)} \]
Sample standard deviation \[ S = \sqrt{S^2} \]
Sample proportions \[ \hat{p} = \frac{X}{n} = \frac{\text{number in the sample with trait}}{\text{sample size}} \]
\[ \widehat{p_1p_2} = \hat{p_1}\hat{p_2} = \frac{X_1}{n_1}  \frac{X_2}{n_2} = \frac{n_2X_1 = n_1X_2}{n_1n_2} \]
Estimators
Point Estimator
\(\hat{\theta}\) is a statistic used to approximate a population parameter \(\theta\)
Point estimate
The numerical value assumed by \(\hat{\theta}\) when evaluated for a given sample
Unbiased estimator
If \(E(\hat{\theta}) = \theta\), then \(\hat{\theta}\) is an unbiased estimator for \(\theta\)
 \(\bar{X}\) is an unbiased estimator for \(\mu\)
 \(S^2\) is an unbiased estimator for \(\sigma^2\)
 \(\hat{p}\) is an unbiased estimator for p
 \(\widehat{p_1p_2}\) is an unbiased estimator for \(p_1 p_2\)
 \(\bar{X_1}  \bar{X_2}\) is an unbiased estimator for \(\mu_1  \mu_2\)
Note: \(S\) is a biased estimator for \(\sigma\)
Distribution of the sample mean
If \(\bar{X}\) is the sample mean based on a random sample of size \(n\) drawn from a normal distribution \(X\) with mean \(\mu\) and standard deviation \(\sigma\), the \(\bar{X}\) is normally distributed, with mean \(\mu_{\bar{X}} = \mu\) and variance \(\sigma_{\bar{X}}^2 = Var(\bar{X}) = \frac{\sigma^2}{n}\). Then the standard error of the mean is: \(\sigma_{\bar{X}}= \frac{\sigma}{\sqrt{n}}\)
4.1 One Sample Inference
\(Y_i \sim i.i.d. N(\mu, \sigma^2)\)
i.i.d. standards for “independent and identically distributed”
Hence, we have the following model:
\(Y_i=\mu +\epsilon_i\) where

\(\epsilon_i \sim^{iid} N(0,\sigma^2)\)

\(E(Y_i)=\mu\)

\(Var(Y_i)=\sigma^2\)
 \(\bar{y} \sim N(\mu,\sigma^2/n)\)
4.1.1 The Mean
When \(\sigma^2\) is estimated by \(s^2\), then
\[ \frac{\bar{y}\mu}{s/\sqrt{n}} \sim t_{n1} \]
Then, a \(100(1\alpha) \%\) confidence interval for \(\mu\) is obtained from:
\[ 1  \alpha = P(t_{\alpha/2;n1} \le \frac{\bar{y}\mu}{s/\sqrt{n}} \le t_{\alpha/2;n1}) \\ = P(\bar{y}  (t_{\alpha/2;n1})s/\sqrt{n} \le \mu \le \bar{y} + (t_{\alpha/2;n1})s/\sqrt{n}) \]
And the interval is
\[ \bar{y} \pm (t_{\alpha/2;n1})s/\sqrt{n} \]
and \(s/\sqrt{n}\) is the standard error of \(\bar{y}\)
If the experiment were repeated many times, \(100(1\alpha) \%\) of these intervals would contain \(\mu\)
Confidence Interval \(100(1\alpha)%\)  Sample Sizes Confidence \(\alpha\), Error \(d\)  Hypothesis Testing Test Statistic  

When \(\sigma^2\) is known, X is normal (or \(n \ge 25\))  \(\bar{X} \pm z_{\alpha/2}\frac{\sigma}{\sqrt{n}}\)  \(n \approx \frac{z_{\alpha/2}^2 \sigma^2}{d^2}\)  \(z = \frac{\bar{X}\mu_0}{\sigma/\sqrt{n}}\) 
When \(\sigma^2\) is unknown, X is normal (or \(n \ge 25\))  \(\bar{X} \pm t_{\alpha/2}\frac{s}{\sqrt{n}}\)  \(n \approx \frac{z_{\alpha/2}^2 s^2}{d^2}\)  \(t = \frac{\bar{X}\mu_0}{s/\sqrt{n}}\) 
4.1.1.1 For Difference of Means (\(\mu_1\mu_2\)), Independent Samples
\(100(1\alpha)%\) Confidence Interval  Hypothesis Testing Test Statistic  

When \(\sigma^2\) is known  \(\bar{X}_1  \bar{X}_2 \pm z_{\alpha/2}\sqrt{\frac{\sigma^2_1}{n_1}+\frac{\sigma^2_2}{n_2}}\)  \(z= \frac{(\bar{X}_1\bar{X}_2)(\mu_1\mu_2)_0}{\sqrt{\frac{\sigma^2_1}{n_1}+\frac{\sigma^2_2}{n_2}}}\)  
When \(\sigma^2\) is unknown, Variances Assumed EQUAL  \(\bar{X}_1  \bar{X}_2 \pm t_{\alpha/2}\sqrt{s^2_p(\frac{1}{n_1}+\frac{1}{n_2})}\)  \(t = \frac{(\bar{X}_1\bar{X}_2)(\mu_1\mu_2)_0}{\sqrt{s^2_p(\frac{1}{n_1}+\frac{1}{n_2})}}\)  Pooled Variance: \(s_p^2 = \frac{(n_1 1)s^2_1  (n_21)s^2_2}{n_1 + n_2 2}\) Degrees of Freedom: \(\gamma = n_1 + n_2 2\) 
When \(\sigma^2\) is unknown, Variances Assumed UNEQUAL  \(\bar{X}_1  \bar{X}_2 \pm t_{\alpha/2}\sqrt{(\frac{s^2_1}{n_1}+\frac{s^2_2}{n_2})}\)  \(t = \frac{(\bar{X}_1\bar{X}_2)(\mu_1\mu_2)_0}{\sqrt{(\frac{s^2_1}{n_1}+\frac{s^2_2}{n_2})}}\)  Degrees of Freedom: \(\gamma = \frac{(\frac{s_1^2}{n_1}+\frac{s^2_2}{n_2})^2}{\frac{(\frac{s_1^2}{n_1})^2}{n_11}+\frac{(\frac{s_2^2}{n_2})^2}{n_21}}\) 
4.1.1.2 For Difference of Means (\(\mu_1  \mu_2\)), Paired Samples (D = XY)
\(100(1\alpha)%\) Confidence Interval
\[
\bar{D} \pm t_{\alpha/2}\frac{s_d}{\sqrt{n}}
\]
Hypothesis Testing Test Statistic
\[ t = \frac{\bar{D}D_0}{s_d / \sqrt{n}} \]
4.1.1.3 Difference of Two Proportions
Mean
\[ \hat{p_1}\hat{p_2} \]
Variance \[ \frac{p_1(1p_1)}{n_1} + \frac{p_2(1p_2)}{n_2} \]
\(100(1\alpha)%\) Confidence Interval
\[ \hat{p_1}\hat{p_2} + z_{\alpha/2}\sqrt{\frac{p_1(1p_1)}{n_1} + \frac{p_2(1p_2)}{n_2}} \]
Sample Sizes, Confidence \(\alpha\), Error d
(Prior Estimate fo \(\hat{p_1},\hat{p_2}\))
\[ n \approx \frac{z_{\alpha/2}^2[p_1(1p_1)+p_2(1p_2)]}{d^2} \]
(No Prior Estimates for \(\hat{p}\))
\[ n \approx \frac{z_{\alpha/2}^2}{2d^2} \]
Hypothesis Testing  Test Statistics
Null Value \((p_1  p_2) \neq 0\)
\[ z = \frac{(\hat{p_1}  \hat{p_2})(p_1  p_2)_0}{\sqrt{\frac{p_1(1p_1)}{n_1} + \frac{p_2(1p_2)}{n_2}}} \]
Null Value \((p_1  p_2)_0 = 0\)
\[ z = \frac{\hat{p_1}  \hat{p_2}}{\sqrt{\hat{p}(1\hat{p})(\frac{1}{n_1}+\frac{1}{n_2})}} \]
where
\[ \hat{p}= \frac{x_1 + x_2}{n_1 + n_2} = \frac{n_1 \hat{p_1} + n_2 \hat{p_2}}{n_1 + n_2} \]
4.1.2 Single Variance
\[ 1  \alpha = P( \chi_{1\alpha/2;n1}^2) \le (n1)s^2/\sigma^2 \le \chi_{\alpha/2;n1}^2) \\ = P(\frac{(n1)s^2}{\chi_{\alpha/2}^2} \le \sigma^2 \le \frac{(n1)s^2}{\chi_{1\alpha/2}^2}) \]
and a \(100(1\alpha) \%\) confidence interval for \(\sigma^2\) is:
\[ (\frac{(n1)s^2}{\chi_{\alpha/2;n1}^2},\frac{(n1)s^2}{\chi_{1\alpha/2;n1}^2}) \] Confidence limits for \(\sigma^2\) are obtained by computing the positive square roots of these limits
Equivalently,
\(100(1\alpha)%\) Confidence Interval
\[ L_1 = \frac{(n1)s^2}{\chi^2_{\alpha/2}} \\ L_1 = \frac{(n1)s^2}{\chi^2_{1\alpha/2}} \] Hypothesis Testing Test Statistic
\[ \chi^2 = \frac{(n1)s^2}{\sigma^2_0} \]
4.1.3 Single Proportion (p)
Confidence Interval \(100(1\alpha)%\)  Sample Sizes Confidence \(\alpha\), Error d (prior estimate for \(\hat{p}\))  (No prior estimate for \(\hat{p}\))  Hypothesis Testing Test Statistic 

\(\hat{p} \pm z_{\alpha/2}\sqrt{\frac{\hat{p}(1\hat{p})}{n}}\)  \(n \approx \frac{z_{\alpha/2}^2 \hat{p}(1\hat{p})}{d^2}\)  \(n \approx \frac{z_{\alpha/2}^2}{4d^2}\)  \(z = \frac{\hat{p}p_0}{\sqrt{\frac{p_0(1p_0)}{n}}}\) 
4.1.4 Power
Formally, power (for the test of the mean) is given by:
\[ \pi(\mu) = 1  \beta = P(\text{test rejects } H_0\mu) \] To evaluate the power, one needs to know the distribution of the test statistic if the null hypothesis is false.
For 1sided ztest where \(H_0: \mu \le \mu_0 \\ H_A: \mu >0\)
The power is:
\[ \begin{aligned} \pi(\mu) &= P(\bar{y} > \mu_0 + z_{\alpha} \sigma/\sqrt{n}\mu) \\ &= P(Z = \frac{\bar{y}  \mu}{\sigma / \sqrt{n}} > z_{\alpha} + \frac{\mu_0  \mu}{\sigma/ \sqrt{n}}\mu) \\ &= 1  \Phi(z_{\alpha} + \frac{(\mu_0  \mu)\sqrt{n}}{\sigma}) \\ &= \Phi(z_{\alpha}+\frac{(\mu \mu_0)\sqrt{n}}{\sigma}) \end{aligned} \]
where \(1\Phi(x) = \Phi(x)\) since the normal pdf is symmetric
Power is correlated to the difference in \(\mu  \mu_0\), sample size n, variance \(\sigma^2\), and the \(\alpha\)level of the test (through \(z_{\alpha}\))
Equivalently, power can be increased by making \(\alpha\) large, \(\sigma^2\) smaller, or n larger.
For 2sided ztest is:
\[ \pi(\mu) = \Phi(z_{\alpha/2} + \frac{(\mu_0  \mu)\sqrt{n}}{\sigma}) + \Phi(z_{\alpha/2}+\frac{(\mu  \mu_0)\sqrt{n}}{\sigma}) \]
4.1.5 Sample Size
4.1.5.1 1sided Ztest
Example: to show that the mean response \(\mu\) under the treatment is higher than the mean response \(\mu_0\) without treatment (show that the treatment effect \(\delta = \mu \mu_0\) is large)
Because power is an increasing function of \(\mu  \mu_0\), it is only necessary to find n that makes the power equal to \(1 \beta\) at \(\mu = \mu_0 + \delta\)
Hence, we have
\[ \pi(\mu_0 + \delta) = \Phi(z_{\alpha} + \frac{\delta \sqrt{n}}{\sigma}) = 1  \beta \]
Since \(\Phi (z_{\beta})= 1\beta\), we have
\[ z_{\alpha} + \frac{\delta \sqrt{n}}{\sigma} = z_{\beta} \]
Then n is
\[ n = (\frac{(z_{\alpha}+z_{\beta})\sigma}{\delta})^2 \]
Then, we need larger samples, when
 the sample variability is large (\(\sigma\) is large)
 \(\alpha\) is small (\(z_{\alpha}\) is large)
 power \(1\beta\) is large (\(z_{\beta}\) is large)
 The magnitude of the effect is smaller (\(\delta\) is small)
Since we don’t know \(\delta\) and \(\sigma\). We can base \(\sigma\) on previous studies, pilot studies. Or, obtain an estimate of \(\sigma\) by anticipating the range of the observation (without outliers). divide this range by 4 and use the resulting number as an approximate estimate of \(\sigma\). For normal (distribution) data, this is reasonable.
4.1.5.2 2sided Ztest
We want to know the min n, required to guarantee \(1\beta\) power when the treatment effect \(\delta = \mu  \mu_0\) is at least greater than 0. Since the power function for the 2sided is increasing and symmetric in \(\mu  \mu_0\), we only need to find n that makes the power equal to \(1\beta\) when \(\mu = \mu_0 + \delta\)
\[ n = (\frac{(z_{\alpha/2} + z_{\beta}) \sigma}{\delta})^2 \]
We could also use the confidence interval approach. If we require that an \(\alpha\)level twosided CI for \(\mu\) be
\[ \bar{y} \pm D \] where \(D = z_{\alpha/2}\sigma/\sqrt{n}\) gives
\[ n = (\frac{z_{\alpha/2}\sigma}{D})^2 \] (round up to the nearest integer)
data = rnorm(100)
t.test(data, conf.level=0.95)
#>
#> One Sample ttest
#>
#> data: data
#> t = 0.42865, df = 99, pvalue = 0.6691
#> alternative hypothesis: true mean is not equal to 0
#> 95 percent confidence interval:
#> 0.1728394 0.2680940
#> sample estimates:
#> mean of x
#> 0.04762729
\[ H_0: \mu \ge 30 \\ H_a: \mu < 30 \]
t.test(data, mu=30,alternative="less")
#>
#> One Sample ttest
#>
#> data: data
#> t = 269.57, df = 99, pvalue < 2.2e16
#> alternative hypothesis: true mean is less than 30
#> 95 percent confidence interval:
#> Inf 0.2321136
#> sample estimates:
#> mean of x
#> 0.04762729
4.1.6 Note
For ttests, the sample and power are not as easy as ztest.
\[ \pi(\mu) = P(\frac{\bar{y}\mu_0}{s/\sqrt{n}}> t_{n1;\alpha}\mu) \]
when \(\mu > \mu_0\) (i.e., \(\mu  \mu_0 = \delta\)), the random variable \((\bar{y}  \mu_0)/(s/\sqrt{n})\) does not have a Student’s t distribution, but rather is distributed as a noncentral tdistribution with noncentrality parameter \(\delta \sqrt{n}/\sigma\) and d.f. of \(n1\)
 The power is an increasing function of this noncentrality parameter (note, when \(\delta = 0\) the distribution is usual Student’s tdistribution).
 To evaluate power, one must consider numerical procedure or use special charts
Approximate Sample Size Adjustment for ttest. We use an adjustment to the ztest determination for sample size.
Let \(v = n1\), where n is sample size derived based on the ztest power. Then the 2sided ttest sample size (approximate) is given:
\[ n^* = \frac{(t_{v;\alpha/2}+t_{v;\beta})^2 \sigma^2}{\delta^2} \]
4.1.7 Onesample Nonparametric Methods
lecture.data = c(0.76, 0.82, 0.80, 0.79, 1.06, 0.83,0.43,0.34, 3.34, 2.33)
4.1.7.1 Sign Test
If we want to test \(H_0: \mu_{(0.5)} = 0; H_a: \mu_{(0.5)} >0\) where \(\mu_{(0.5)}\) is the population median. We can
 Count the number of observation (\(y_i\)’s) that exceed 0. Denote this number by \(s_+\), called the number of plus signs. Let \(s_ = n  s_+\), which is the number of minus signs.
 Reject \(H_0\) if \(s_+\) is large or equivalently, if \(s_\) is small.
To determine how large \(s_+\) must be to reject \(H_0\) at a given significance level, we need to know the distribution of the corresponding random variable \(S_+\) under the null hypothesis, which is a binomial with p = 1/2,w hen the null is true.
To work out the null distribution using the binomial formula, we have \(\alpha\)level test rejects \(H_0\) if \(s_+ \ge b_{n,\alpha}\), where \(b_{n,\alpha}\) is the upper \(\alpha\) critical point of the \(Bin(n,1/2)\) distribution. Both \(S_+\) and \(S_\) have this same distribution (\(S = S_+ = S_\)).
\[ \text{pvalue} = P(S \ge s_+) = \sum_{i = s_+}^{n} {{n}\choose{i}} (\frac{1}{2})^n \] equivalently,
\[ P(S \le s_) = \sum_{i=0}^{s_}{{n}\choose{i}} (\frac{1}{2})^2 \] For large sample sizes, we could use the normal approximation for the binomial, in which case reject \(H_0\) if
\[ s_+ \ge n/2 + 1/2 + z_{\alpha}\sqrt{n/4} \]
For the 2sided test, we use the tests statistic \(s_{max} = max(s_+,s_)\) or \(s_{min} = min(s_+, s_)\). An \(\alpha\)level test rejects \(H_0\) if the pvalue is \(\le \alpha\), where the pvalue is computed from:
\[ pvalue = 2 \sum_{i=s_{max}}^{n} {{n}\choose{i}} (\frac{1}{2})^n = s \sum_{i=0}^{s_{min}} {{n}\choose{i}} (\frac{1}{2})^n \] Equivalently, rejecting \(H_0\) if \(s_{max} \ge b_{n,\alpha/2}\)
A large sample normal approximation can be used, where
\[ z = \frac{s_{max} n/2 1/2}{\sqrt{n/4}} \] and reject \(H_0\) at \(\alpha\) if \(z \ge z_{\alpha/2}\)
However, treatment of 0 is problematic for this test.
 Solution 1: randomly assign 0 to the positive or negative (2 researchers might get different results).
 Solution 2: count each 0 as a contribution 1/2 toward \(s_+\) and \(s_\) (but then could not apply the binomial distribution)
 Solution 3: ignore 0 (reduces the power of test due to decreased sample size).
binom.test(sum(lecture.data > 0), length(lecture.data))
#>
#> Exact binomial test
#>
#> data: sum(lecture.data > 0) and length(lecture.data)
#> number of successes = 8, number of trials = 10, pvalue = 0.1094
#> alternative hypothesis: true probability of success is not equal to 0.5
#> 95 percent confidence interval:
#> 0.4439045 0.9747893
#> sample estimates:
#> probability of success
#> 0.8
# alternative = "greater" or alternative = "less"
4.1.7.2 Wilcoxon Signed Rank Test
Since the Sign Test could not consider the magnitude of each observation from 0, the Wilcoxon Signed Rank Test improves by taking account the ordered magnitudes of the observation, but it will impose the requirement of symmetric to this test (while Sign Test does not)
\[ H_0: \mu_{0.5} = 0 \\ H_a: \mu_{0.5} > 0 \] (assume no ties or same observations)
The signed rank test procedure:
 rank order the observation \(y_i\) in terms of their absolute values. Let \(r_i\) be the rank of \(y_i\) in this ordering. Since we assume no ties, the ranks \(r_i\) are uniquely determined and are a permutation of the integers \(1,2,…,n\).
 Calculate \(w_+\), which is the sum of the ranks of the positive values, and \(w_\), which is the sum of the ranks of the negative values. Note that \(w_+ + w_ = r_1 + r_2 + ... = 1 + 2 + ... + n = n(n+1)/2\)
 Reject \(H_0\) if \(w_+\) is large (or if \(w_\) is small)
To know what is large or small with regard to \(w_+\) and \(w_\), we need the distribution of \(W_+\) and \(W_\) when the null is true.
Since these null distributions are identical and symmetric, the pvalue is \(P(W \ge w_+) = P(W \le w_)\)
An \(\alpha\)level test rejects the null if the pvalue is \(\le \alpha\), or if \(w_+ \ge w_{n,\alpha}\), where \(w_{n,\alpha}\) is the upper \(\alpha\) critical point of the null distribution of W.
This distribution of W has a special table. For large n, the distribution of W is approximately normal.
\[ z = \frac{w_+  n(n+1) /4 1/2}{\sqrt{n(n+1)(2n+1)/24}} \]
The test rejects \(H_0\) at level \(\alpha\) if
\[ w_+ \ge n(n+1)/4 +1/2 + z_{\alpha}\sqrt{n(n+1)(2n+1)/24} \approx w_{n,\alpha} \]
For the 2sided test, we use \(w_{max}=max(w_+,w_)\) or \(w_{min}=min(w_+,w_)\), with pvalue given by:
\[ pvalue = 2P(W \ge w_{max}) = 2P(W \le w_{min}) \] Same as Sign Test,we ignore 0. In some cases where some of the \(y_i\)’s may be tied for the same rank, we simply assign each of the tied ranks the average rank (or “midrank”).
Example, if \(y_1 = 1\), \(y_3 = 3\) and \(y_3 = 3\), and \(y_4 =5\), then \(r_1 = 1\), \(r_2 = r_3=(2+3)/2 = 2.5\), \(r_4 = 4\)
wilcox.test(lecture.data)
#>
#> Wilcoxon signed rank exact test
#>
#> data: lecture.data
#> V = 52, pvalue = 0.009766
#> alternative hypothesis: true location is not equal to 0
# does not use normal approximation
# (using the underlying W distribution)
wilcox.test(lecture.data,exact=F)
#>
#> Wilcoxon signed rank test with continuity correction
#>
#> data: lecture.data
#> V = 52, pvalue = 0.01443
#> alternative hypothesis: true location is not equal to 0
# uses normal approximation
4.2 Two Sample Inference
4.2.1 Means
Suppose we have 2 sets of observations,

\(y_1,..., y_{n_y}\)
 \(x_1,...,x_{n_x}\)
that are random samples from two independent populations with means \(\mu_y\) and \(\mu_x\) and variances \(\sigma^2_y\),\(\sigma^2_x\). Our goal is to compare \(\mu_x\) and \(\mu_y\) or \(\sigma^2_y = \sigma^2_x\)
4.2.1.1 Large Sample Tests
Assume that \(n_y\) and \(n_x\) are large (\(\ge 30\)). Then,
\[ E(\bar{y}  \bar{x}) = \mu_y  \mu_x \\ Var(\bar{y}  \bar{x}) = \sigma^2_y /n_y + \sigma^2_x/n_x \]
Then,
\[ Z = \frac{\bar{y}\bar{x}  (\mu_y  \mu_x)}{\sqrt{\sigma^2_y /n_y + \sigma^2_x/n_x}} \sim N(0,1) \] (according to Central Limit Theorem). For large samples, we can replace variances by their unbiased estimators (\(s^2_y,s^2_x\)), and get the same large sample distribution.
An approximate \(100(1\alpha) \%\) CI for \(\mu_y  \mu_x\) is given by:
\[ \bar{y}  \bar{x} \pm z_{\alpha/2}\sqrt{s^2_y/n_y + s^2_x/n_x} \]
\[ H_0: \mu_y  \mu_x = \delta_0 \\ H_A: \mu_y  \mu_x \neq \delta_0 \]
at the \(\alpha\)level with the statistic:
\[ z = \frac{\bar{y}\bar{x}  \delta_0}{\sqrt{s^2_y /n_y + s^2_x/n_x}} \]
and reject \(H_0\) if \(z > z_{\alpha/2}\)
If \(\delta = )\), it means that we are testing whether two means are equal.
4.2.1.2 Small Sample Tests
If the two samples are from normal distribution, iid \(N(\mu_y,\sigma^2_y)\) and iid \(N(\mu_x,\sigma^2_x)\) and the two samples are independent, we can do inference based on the tdistribution
Then we have 2 cases
4.2.1.2.1 Equal variance
Assumptions
 iid: so that \(var(\bar{y}) = \sigma^2_y / n_y ; var(\bar{x}) = \sigma^2_x / n_x\)
 Independence between samples: No observation from one sample can influence any observation from the other sample, to have
\[ \begin{aligned} var(\bar{y}  \bar{x}) &= var(\bar{y}) + var{\bar{x}}  2cov(\bar{y},\bar{x}) \\ &= var(\bar{y}) + var{\bar{x}} \\ &= \sigma^2_y / n_y + \sigma^2_x / n_x \end{aligned} \]
 Normality: Justifies the use of the tdistribution
Let \(\sigma^2 = \sigma^2_y = \sigma^2_x\). Then, \(s^2_y\) and \(s^2_x\) are both unbiased estimators of \(\sigma^2\). We then can pool them.
Then the pooled variance estimate is \[ s^2 = \frac{(n_y  1)s^2_y + (n_x  1)s^2_x}{(n_y1)+(n_x1)} \] has \(n_y + n_x 2\) df.
Then the test statistic
\[ T = \frac{\bar{y} \bar{x} (\mu_y  \mu_x)}{s\sqrt{1/n_y + 1/n_x}} \sim t_{n_y + n_x 2} \]
\(100(1  \alpha) \%\) CI for \(\mu_y  \mu_x\) is
\[ \bar{y}  \bar{x} \pm (t_{n_y + n_x 2})s\sqrt{1/n_y + 1/n_x} \]
Hypothesis testing:
\[
H_0: \mu_y  \mu_x = \delta_0 \\
H_1: \mu_y  \mu_x \neq \delta_0
\]
we reject \(H_0\) if \(t > t_{n_y + n_x 2;\alpha/2}\)
4.2.1.2.2 Unequal Variance
Assumptions
 Two samples are independent
 Scatter plots
 Correlation coefficient (if normal)
 Scatter plots
 Independence of observation in each sample
 Test for serial correlation
 Test for serial correlation
 For each sample, homogeneity of variance
 Scatter plots
 Formal tests
 Scatter plots

Normality
 Equality of variances (homogeneity of variance between samples)

Ftest
 Barlett test
 [Modified Levene Test]

Ftest
To compare 2 normal \(\sigma^2_y \neq \sigma^2_x\), we use the test statistic:
\[ T = \frac{\bar{y} \bar{x} (\mu_y  \mu_x)}{\sqrt{s^2_y/n_y + s^2_x/n_x}} \] In this case, T does not follow the tdistribution (its distribution depends on the ratio of the unknown variances \(\sigma^2_y,\sigma^2_x\)). In the case of small sizes, we can can approximate tests by using the WelchSatterthwaite method (Satterthwaite 1946). We assume T can be approximated by a tdistribution, and adjust the degrees of freedom.
Let \(w_y = s^2_y /n_y\) and \(w_x = s^2_x /n_x\) (the w’s are the square of the respective standard errors)
Then, the degrees of freedom are
\[ v = \frac{(w_y + w_x)^2}{w^2_y / (n_y1) + w^2_x / (n_x1)} \]
Since v is usually fractional, we truncate down to the nearest integer.
\(100 (1\alpha) \%\) CI for \(\mu_y  \mu_x\) is
\[ \bar{y}  \bar{x} \pm t_{v,\alpha/2} \sqrt{s^2_y/n_y + s^2_x /n_x} \]
Reject \(H_0\) if \(t > t_{v,\alpha/2}\), where
\[ t = \frac{\bar{y}  \bar{x}\delta_0}{\sqrt{s^2_y/n_y + s^2_x /n_x}} \]
4.2.2 Variances
\[ F_{ndf,ddf}= \frac{s^2_1}{s^2_2} \]
where \(s^2_1>s^2_2, ndf = n_11,ddf = n_21\)
4.2.2.1 Ftest
Test
\[ H_0: \sigma^2_y = \sigma^2_x \\ H_a: \sigma^2_y \neq \sigma^2_x \]
Consider the test statistic,
\[ F= \frac{s^2_y}{s^2_x} \]
Reject \(H_0\) if

\(F>f_{n_y 1,n_x 1,\alpha/2}\) or
 \(F<f_{n_y 1,n_x 1,1\alpha/2}\)
Where \(F>f_{n_y 1,n_x 1,\alpha/2}\) and \(F<f_{n_y 1,n_x 1,1\alpha/2}\) are the upper and lower \(\alpha/2\) critical points of an Fdistribution, with a \(n_y1\) and \(n_x1\) degrees of freedom.
Note
 This test depends heavily on the assumption Normality.
 In particular, it could give to many significant results when observations come from longtailed distributions (i.e., positive kurtosis).
 If we cannot find support for normality, then we can use nonparametric tests such as the Modified Levene Test (BrownForsythe Test)
data(iris)
irisVe=iris$Petal.Width[iris$Species=="versicolor"]
irisVi=iris$Petal.Width[iris$Species=="virginica"]
var.test(irisVe,irisVi)
#>
#> F test to compare two variances
#>
#> data: irisVe and irisVi
#> F = 0.51842, num df = 49, denom df = 49, pvalue = 0.02335
#> alternative hypothesis: true ratio of variances is not equal to 1
#> 95 percent confidence interval:
#> 0.2941935 0.9135614
#> sample estimates:
#> ratio of variances
#> 0.5184243
4.2.2.2 Modified Levene Test (BrownForsythe Test)
 considers averages of absolute deviations rather than squared deviations. Hence, less sensitive to longtailed distributions.
 This test is still good for normal data
For each sample, we consider the absolute deviation of each observation form the median:
\[ d_{y,i} = y_i  y_{.5} \\ d_{x,i} = x_i  x_{.5} \] Then,
\[ t_L^* = \frac{\bar{d}_y\bar{d}_x}{s \sqrt{1/n_y + 1/n_x}} \]
The pooled variance \(s^2\) is given by:
\[ s^2 = \frac{\sum_i^{n_y}(d_{y,i}\bar{d}_y)^2 + \sum_j^{n_x}(d_{x,i}\bar{d}_x)^2}{n_y + n_x 2} \]
 If the error terms have constant variance and \(n_y\) and \(n_x\) are not extremely small, then \(t_L^* \sim t_{n_x + n_y 2}\)
 We reject the null hypothesis when \(t_L^* > t_{n_y + n_x 2;\alpha/2}\)
 This is just the twosample ttest applied to the absolute deviations.
dVe=abs(irisVemedian(irisVe))
dVi=abs(irisVimedian(irisVi))
t.test(dVe,dVi,var.equal=T)
#>
#> Two Sample ttest
#>
#> data: dVe and dVi
#> t = 2.5584, df = 98, pvalue = 0.01205
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#> 0.12784786 0.01615214
#> sample estimates:
#> mean of x mean of y
#> 0.154 0.226
# small samples ttest
t.test(irisVe,irisVi,var.equal=F)
#>
#> Welch Two Sample ttest
#>
#> data: irisVe and irisVi
#> t = 14.625, df = 89.043, pvalue < 2.2e16
#> alternative hypothesis: true difference in means is not equal to 0
#> 95 percent confidence interval:
#> 0.7951002 0.6048998
#> sample estimates:
#> mean of x mean of y
#> 1.326 2.026
4.2.3 Power
Consider \(\sigma^2_y = \sigma^2_x = \sigma^2\)
Under the assumption of equal variances, we take size samples from both groups (\(n_y = n_x = n\))
For 1sided testing,
\[ H_0: \mu_y  \mu_x \le 0 \\ H_a: \mu_y  \mu_x > 0 \]
\(\alpha\)level ztest rejects \(H_0\) if
\[ z = \frac{\bar{y}  \bar{x}}{\sigma \sqrt{2/n}} > z_{\alpha} \]
\[ \pi(\mu_y  \mu_x) = \Phi(z_{\alpha} + \frac{\mu_y \mu_x}{\sigma}\sqrt{n/2}) \]
We need sample size n that give at least \(1\beta\) power when \(\mu_y  \mu_x = \delta\), where \(\delta\) is the smallest difference that we want to see.
Power is given by:
\[ \Phi(z_{\alpha} + \frac{\delta}{\sigma}\sqrt{n/2}) = 1  \beta \]
4.2.4 Sample Size
Then, the sample size is
\[ n = 2(\frac{\sigma (z_{\alpha} + z_{\beta}}{\delta})^2 \]
For 2sided test, replace \(z_{\alpha}\) with \(z_{\alpha/2}\).
As with the onesample case, to perform an exact 2sample ttest sample size calculation, we must use a noncentral tdistribution.
A correction that gives the approximate ttest sample size can be obtained by using the ztest n value in the formula:
\[
n^* = 2(\frac{\sigma (t_{2n2;\alpha} + t_{2n2;\beta})}{\delta})^2
\]
where we use \(\alpha/2\) for the twosided test
4.2.5 Matched Pair Designs
We have two treatments
Subject  Treatment A  Treatment B  Difference 

1  \(y_1\)  \(x_1\)  \(d_1 = y_1  x_1\) 
2  \(y_2\)  \(x_2\)  \(d_2 = y_2  x_2\) 
.  .  .  . 
n  \(y_n\)  \(x_n\)  \(d_n = y_n  x_n\) 
we assume \(y_i \sim^{iid} N(\mu_y, \sigma^2_y)\) and \(x_i \sim^{iid} N(\mu_x,\sigma^2_x)\), but since \(y_i\) and \(x_i\) are measured on the same subject, they are correlated.
Let
\[ \mu_D = E(y_i  x_i) = \mu_y \mu_x \\ \sigma^2_D = var(y_i  x_i) = Var(y_i) + Var(x_i) 2cov(y_i,x_i) \]
If the matching induces positive correlation, then the variance of the difference of the measurements is reduced as compared to the independent case. This is the point of Matched Pair Designs. Although covariance can be negative, giving a larger variance of the difference than the independent sample case, usually the covariance is positive. This means both \(y_i\) and \(x_i\) are large for many of the same subjects, and for others, both measurement are small. (we still assume that various subjects respond independently of each other, which is necessary for the iid assumption within groups).
Let \(d_i = y_i  x_i\), then

\(\bar{d} = \bar{y}\bar{x}\) is the sample mean of the \(d_i\)
 \(s_d^2=\frac{1}{n1}\sum_{i=1}^n (d_i  \bar{d})^2\) is the sample variance of the difference
Once the data are converted to differences, we are back to One Sample Inference and can use its tests and CIs.
4.2.6 Nonparametric Tests for Two Samples
For Matched Pair Designs, we can use the Onesample Nonparametric Methods.
Assume that Y and X are random variables with CDF \(F_y\) and \(F_x\). then, Y is stochastically larger than X for all real number u, \(P(Y > u) \ge P(X > u)\).
Equivalently, \(P(Y \le u) \le P(X \le u)\), which is \(F_Y(u) \le F_X(u)\), same thing as \(F_Y < F_X\)
If two distributions are identical, except that one is shifted relative to the other, then each of distribution can be indexed by a location parameter, say \(\theta_y\) and \(\theta_x\). In this case, \(Y>X\) if \(\theta_y > \theta_x\)
Consider the hypotheses,
\[ H_0: F_Y = F_X \\ H_a: F_Y < F_X \] where the alternative is an upper onesided alternative.
 We can also consider the lower onesided alternative
\[ H_a: F_Y > F_X \text{ or} \\ H_a: F_Y < F_X \text{ or } F_Y > F_X \]
 In this case, we don’t use \(H_a: F_Y \neq F_X\) as that allows arbitrary differences between the distributions, without requiring one be stochastically larger than the other.
If the distributions only differ in terms of their location parameters, we can focus hypothesis tests on the parameters (e.g., \(H_0: \theta_y = \theta_x\) vs. \(\theta_y > \theta_x\))
We have 2 equivalent nonparametric tests that consider the hypothesis mentioned above
4.2.6.1 Wilcoxon rank test
 Combine all \(n= n_y + n_x\) observations and rank them in ascending order.
 Sum the ranks of the \(y\)’s and \(x\)’s separately. Let \(w_y\) and \(w_x\) be these sums. (\(w_y + w_x = 1 + 2 + ... + n = n(n+1)/2\))
 Reject \(H_0\) if \(w_y\) is large (equivalently, \(w_x\) is small)
Under \(H_0\), any arrangement of the \(y\)’s and \(x\)’s is equally likely to occur, and there are \((n_y + n_x)!/(n_y! n_x!)\) possible arrangements.
 Technically, for each arrangement we can compute the values of \(w_y\) and \(w_x\), and thus generate the distribution of the statistic under the null hypothesis.
 This could lead to computationally intensive.
wilcox.test(
irisVe,
irisVi,
alternative = "two.sided",
conf.level = 0.95,
exact = F,
correct = T
)
#>
#> Wilcoxon rank sum test with continuity correction
#>
#> data: irisVe and irisVi
#> W = 49, pvalue < 2.2e16
#> alternative hypothesis: true location shift is not equal to 0
4.2.6.2 MannWhitney U test
The MannWhitney test is computed as follows:
 Compare each \(y_i\) with each \(x_i\).
Let \(u_y\) be the number of pairs in which \(y_i > x_i\) Let \(u_x\) be the number of pairs in which \(y_i < x_i\). (assume there are no ties). There are \(n_y n_x\) such comparisons and \(u_y + u_x = n_y n_x\).
 Reject \(H_0\) if \(u_y\) is large (or \(u_x\) is small)
MannWhitney U test and Wilcoxon rank test are related:
\[
u_y = w_y  n_y(n_y+1) /2 \\
u_x = w_x  n_x(n_x +1)/2
\]
An \(\alpha\)level test rejects \(H_0\) if \(u_y \ge u_{n_y,n_x,\alpha}\), where \(u_{n_y,n_x,\alpha}\) is the upper \(\alpha\) critical point of the null distribution of the random variable, U.
The pvalue is defined to be \(P(Y \ge u_y) = P(U \le u_x)\). One advantage of MannWhitney U test is that we can use either \(u_y\) or \(u_x\) to carry out the test.
For large \(n_y\) and \(n_x\), the null distribution of U can be well approximated by a normal distribution with mean \(E(U) = n_y n_x /2\) and variance \(var(U) = n_y n_x (n+1)/12\). A large sample ztest can be based on the statistic:
\[ z = \frac{u_y  n_y n_x /2 1/2}{\sqrt{n_y n_x (n+1)/12}} \]
The test rejects \(H_0\) at level \(\alpha\) if \(z \ge z_{\alpha}\) or if \(u_y \ge u_{n_y,n_x,\alpha}\) where
\[ u_{n_y, n_x, \alpha} \approx n_y n_x /2 + 1/2 + z_{\alpha}\sqrt{n_y n_x (n+1)/12} \]
For the 2sided test, we use the test statistic \(u_{max} = max(u_y,u_x)\) and \(u_{min} = min(u_y, u_x)\) and pvalue is given by
\[ pvalue = 2P(U \ge u_{max}) = 2P(U \le u_{min}) \] Since we assume there are no ties (when \(y_i = x_j\)), we count 1/2 towards both \(u_y\) and \(u_x\). Even though the sampling distribution is not the same, but large sample approximation is still reasonable,
4.3 Categorical Data Analysis
Categorical Data Analysis when we have categorical outcomes
 Nominal variables: no logical ordering (e.g., sex)
 Ordinal variables: logical order, but relative distances between values are not clear (e.g., small, medium, large)
The distribution of one variable changes when the level (or values) of the other variable change. The row percentages are different in each column.
4.3.1 Inferences for Small Samples
The approximate tests based on the asymptotic normality of \(\hat{p}_1  \hat{p}_2\) do not apply for small samples.
Using Fisher’s Exact Test to evaluate \(H_0: p_1 = p_2\)
 Assume \(X_1\) and \(X_2\) are independent Binomial
 Let \(x_1\) and \(x_2\) be the corresponding observed values.
 Let \(n= n_1 + n_2\) be the total sample size

\(m = x_1 + x_2\) be the observed number of successes.
 By assuming that m (total successes) is fixed, and conditioning on this value, one can show that the conditional distribution of the number of successes from sample 1 is Hypergeometric
 If we want to test \(H_0: p_1 = p_2\) and \(H_a: p_1 \neq p_2\), we have
\[ Z^2 = (\frac{\hat{p}_1  \hat{p}_2}{\sqrt{\hat{p}(1\hat{p})(1/n_1 + 1/n_2)}})^2 \sim \chi_{1,\alpha}^2 \]
where \(\chi_{1,\alpha}^2\) is the upper \(\alpha\) percentage point for the central Chisquared with one d.f.
This extends to the contingency table setting: whether the observed frequencies are equal to those expected under a null hypothesis of no association.
4.3.2 Test of Association
Pearson Chisquare test statistic is
\[ \chi^2 = \sum_{\text{all categories}} \frac{(observed  epxected)^2}{expected} \]
Comparison of proportions for several independent surveys or experiments
Experiment 1  Experiment 2  …  Experiment k  

Number of successes  \(x_1\)  \(x_2\)  …  \(x_k\) 
Number of failures  \(n_1  x_1\)  \(n_2  x_2\)  …  \(n_k  x_k\) 
\(n_1\)  \(n_2\)  …  \(n_k\) 
\(H_0: p_1 = p_2 = \dots = p_k\) vs. the alternative that the null is not true (at least one pair are not equal).
We estimate the common value of the probability of success on a single trial assuming \(H_0\) is true:
\[ \hat{p} = \frac{x_1 + x_2 + ... + x_k}{n_1 + n_2 + ...+ n_k} \]
we use table of expected counts when \(H_0\) is true:
success  \(n_1 \hat{p}\)  \(n_2 \hat{p}\)  …  \(n_k \hat{p}\) 
failure  \(n_1(1\hat{p})\)  \(n_2(1\hat{p})\)  …  \(n_k (1\hat{p})\) 
\(n_1\)  \(n_2\)  …  \(n_k\) 
\[ \chi^2 = \sum_{\text{all cells in table}} \frac{(observed  expected)^2}{expected} \]
with \(k1\) degrees of freedom
4.3.2.1 Twoway Count Data
1  2  …  j  …  c  Row Total  

1  \(n_{11}\)  \(n_{12}\)  …  \(n_{1j}\)  …  \(n_{1c}\)  \(n_{1.}\) 
2  \(n_{21}\)  \(n_{22}\)  …  \(n_{2j}\)  …  \(n_{2c}\)  \(n_{2.}\) 
.  .  .  .  .  .  .  . 
r  \(n_{r1}\)  \(n_{r2}\)  …  \(n_{rj}\)  …  \(n_{rc}\)  \(n_{r.}\) 
Column Total  \(n_{.1}\)  \(n_{.2}\)  …  \(n_{.j}\)  …  \(n_{.c}\)  \(n_{}\) 
Design 1
total sample size fixed \(n\) = constant (e.g., survey on job satisfaction and income); both row and column totals are random variables
Design 2
Fix the sample size in each group (in each row) (e.g., Drug treatments success or failure); fixed number of participants for each treatment; independent random samples from the two row populations.
These different sampling designs imply two different probability models.
4.3.2.2 Total Sample Size Fixed
Design 1
random sample of size n drawn from a single population, and sample units are crossclassified into \(r\) row categories and \(c\) column
This results in an \(r \times c\) table of observed counts
\(n_{ij} = 1,...,r;j=1,...,c\)
Let \(p_{ij}\) be the probability of classification into cell \((i,j)\) and \(\sum_{i=1}^r \sum_{j=1}^c p_{ij} = 1\). Let \(N_{ij}\) be the random variable corresponding to \(n_{ij}\)
The joint distribution of the \(N_{ij}\) is multinomial with unknown parameters \(p_{ij}\)
Denote the row variable by \(X\) and column variable by Y, then \(p_{ij} = P(X=i,Y = j)\) and \(p_{i.} = P(X = i)\) and \(p_{.j} = P(Y = j)\) are the marginal probabilities.
The null hypothesis that X and Y are statistically independent (i.e., no association) is just:
\[ H_0: p_{ij} = P(X =i,Y=j) = P(X =i) P(Y =j) = p_{i.}p_{.j} \\ H_a: p_{ij} \neq p_{i.}p_{.j} \] for all \(i,j\).
4.3.2.3 Row Total Fixed
Design 2
Random samples of sizes \(n_1,...,n_r\) are drawn independently from \(r \ge 2\) row populations. In this case, the 2way table row totals are \(n_{i.} = n_i\) for \(i = 1,...,r\).
The counts from each row are modeled by independent multinomial distributions.
\(X\) is fixed, \(Y\) is observed.
Then, \(p_{ij}\) represent conditional probabilities \(p_{ij} = P(Y=jX=i)\)
The null hypothesis is the probability of response j is the same, regardless of the row population (i.e., no association):
\[ \begin{cases} H_0: p_{ij} = P(Y = j  X = i) = p_j & \text{for all } i,j =1,2,...,c \\ \text{or } H_0: (p_{i1},p_{i2},...,p_{ic}) = (p_1,p_2,...,p_c) & \text{ for all } i\\ H_a: (p_{i1},p_{i2},...,p_{ic}) & \text{ are not the same for all } i \end{cases} \]
Although the hypotheses to be tested are different for two sampling designs, The \(\chi^2\) test is identical
We have estimated expected frequencies:
\[ \hat{e}_{ij} = \frac{n_{i.}n_{.j}}{n} \]
The Chisquare statistic is
\[ \chi^2 = \sum_{i=1}^{r} \sum_{j=1}^{c} \frac{(n_{ij}\hat{e}_{ij})^2}{\hat{e}_{ij}} \sim \chi_{(r1)(c1)} \]
\(\alpha\)level test rejects \(H_0\) if \(\chi^2 > \chi^2_{(r1)(c1),\alpha}\)
4.3.2.4 Pearson Chisquare Test
 Determine whether an association exists
 Sometimes, \(H_0\) represents the model whose validity is to be tested. Contrast this with the conventional formulation of \(H_0\) as the hypothesis that is to be disproved. The goal in this case is not to disprove the model, but to see whether data are consistent with the model and if deviation can be attributed to chance.
 These tests do not measure the strength of an association.
 These tests depend on and reflect the sample size  double the sample size by copying each observation, double the \(\chi^2\) statistic even thought the strength of the association does not change.
 The Pearson Chisquare Test is not appropriate when more than about 20% of the cells have an expected cell frequency of less than 5 (largesample pvalues not appropriate).
 When the sample size is small the exact pvalues can be calculated (this is prohibitive for large samples); calculation of the exact pvalues assumes that the column totals and row totals are fixed.
july.x=480
july.n=1000
sept.x=704
sept.n=1600
\[ H_0: p_J = 0.5 \\ H_a: p_J < 0.5 \]
prop.test(
x = july.x,
n = july.n,
p = 0.5,
alternative = "less",
correct = F
)
#>
#> 1sample proportions test without continuity correction
#>
#> data: july.x out of july.n, null probability 0.5
#> Xsquared = 1.6, df = 1, pvalue = 0.103
#> alternative hypothesis: true p is less than 0.5
#> 95 percent confidence interval:
#> 0.0000000 0.5060055
#> sample estimates:
#> p
#> 0.48
\[ H_0: p_J = p_S \\ H_a: p_j \neq p_S \]
prop.test(
x = c(july.x, sept.x),
n = c(july.n, sept.n),
correct = F
)
#>
#> 2sample test for equality of proportions without continuity correction
#>
#> data: c(july.x, sept.x) out of c(july.n, sept.n)
#> Xsquared = 3.9701, df = 1, pvalue = 0.04632
#> alternative hypothesis: two.sided
#> 95 percent confidence interval:
#> 0.0006247187 0.0793752813
#> sample estimates:
#> prop 1 prop 2
#> 0.48 0.44
4.3.3 Ordinal Association
 An ordinal association implies that as one variable increases, the other tends to increase or decrease (depending on the nature of the association).
 For tests for variables with two or more levels, the levels must be in a logical ordering.
4.3.3.1 MantelHaenszel Chisquare Test
The MantelHaenszel Chisquare Test is more powerful for testing ordinal associations, but does not test for the strength of the association.
This test is presented in the case where one has a series of \(2 \times 2\) tables that examine the same effects under different conditions (If there are \(K\) such tables, we have \(2 \times 2 \times K\) table)
In stratum \(k\), given the marginal totals \((n_{.1k},n_{.2k},n_{1.k},n_{2.k})\), the sampling model for cell counts is the Hypergeometric (knowing \(n_{11k}\) determines \((n_{12k},n_{21k},n_{22k})\), given the marginal totals)
Assuming conditional independence, the Hypergeometric mean and variance of \(n_{11k}\) are
\[ m_{11k} = E(n_{11k}) = \frac{n_{1.k} n_{.1k}}{n_{..k}} \\ var(n_{11k}) = \frac{n_{1.k} n_{2.k} n_{.1k} n_{.2k}}{n_{..k}^2(n_{..k}1)} \]
To test conditional independence, Mantel and Haenszel proposed
\[ M^2 = \frac{(\sum_{k} n_{11k}  \sum_k m_{11k} .5)^2}{\sum_{k}var(n_{11k})} \sim \chi^2_{1} \] This method can be extended to general \(I \times J \times K\) tables.
\((2 \times 2 \times 3)\) table
Bron = array(
c(20, 9, 382, 214, 10, 7, 172, 120, 12, 6, 327, 183),
dim = c(2, 2, 3),
dimnames = list(
Particulate = c("High", "Low"),
Bronchitis = c("Yes", "No"),
Age = c("1524", "2539", "40+")
)
)
margin.table(Bron, c(1, 2))
#> Bronchitis
#> Particulate Yes No
#> High 42 881
#> Low 22 517
# assess whether the relationship between
# Bronchitis by Particulate Level varies by Age
library(samplesizeCMH)
marginal_table = margin.table(Bron, c(1, 2))
odds.ratio(marginal_table)
#> [1] 1.120318
# whether these odds vary by age.
# The conditional odds can be calculated using the original table.
apply(Bron, 3, odds.ratio)
#> 1524 2539 40+
#> 1.2449098 0.9966777 1.1192661
# MantelHaenszel Test
mantelhaen.test(Bron, correct = T)
#>
#> MantelHaenszel chisquared test with continuity correction
#>
#> data: Bron
#> MantelHaenszel Xsquared = 0.11442, df = 1, pvalue = 0.7352
#> alternative hypothesis: true common odds ratio is not equal to 1
#> 95 percent confidence interval:
#> 0.6693022 1.9265813
#> sample estimates:
#> common odds ratio
#> 1.135546
4.3.3.1.1 McNemar’s Test
special case of MantelHaenszel Chisquare Test
vote = cbind(c(682, 22), c(86, 810))
mcnemar.test(vote, correct = T)
#>
#> McNemar's Chisquared test with continuity correction
#>
#> data: vote
#> McNemar's chisquared = 36.75, df = 1, pvalue = 1.343e09
4.3.3.2 Spearman Rank Correlation
To test for the strength of association between two ordinally scaled variables, we can use Spearman Rank Correlation statistic
Let \(X\) and \(Y\) be two random variables measured on an ordinal scale. Consider \(n\) pairs of observations (\(x_i,y_i\)), \(i = 1,\dots,n\)
The Spearman Rank Correlation coefficient (denoted by \(r_S\) is calculated using the Pearson correlation formula, but based on the ranks of \(x_i\) and \(y_i\)).
Spearman Rank Correlation be calculated
 Assign ranks to \(x_i\)’s and \(y_i\)’s separately. Let \(u_i = rank(x_i)\) and \(v_i = rank(y_i)\)
 Calculate \(r_S\) using the formula for the Pearson correlation coefficient, but applied to the ranks:
\[ r_S = \frac{\sum_{i=1}^{n}(u_i  \bar{u})(v_i  \bar{v})}{\sqrt{(\sum_{i = 1}^{n}(u_i  \bar{u})^2)(\sum_{i=1}^{n}(v_i  \bar{v})^2)}} \]
\(r_S\) ranges between 1 and +1 , with
 \(r_S = 1\) if there is a perfect negative monotone association
 \(r_S = +1\) if there is a perfect positive monotone association between X and Y.
To test
\(H_0:\) \(X\) and \(Y\) independent
\(H_a\): \(X\) and \(Y\) positively associated
For large \(n\) (e.g., \(n \ge 10\)),
\[ r_S \sim N(0,1/(n1)) \]
Then,
\[ Z = r_s \sqrt{n1} \sim N(0,1) \]
4.4 Divergence Metrics and Test for Comparing Distributions
Similarity among distributions using divergence statistics, which is different from
Deviation statistics: difference between the realization of a variable and some value (e.g., mean). Statistics of the deviation distributions consist of standard deviation, average absolute deviation, median absolute deviation , maximum absolute deviation.
Deviance statistics: goodnessoffit statistic for statistical models (comparable to the sum of squares of residuals in OLS to cases that use ML estimation). Usually used in generalized linear models.
Divergence statistics is a statistical distance (different from metrics)
Divergences do not require symmetry
Divergences generalize squared distance (instead of linear distance). Hence, fail the triangle inequity
Can be used for
Detecting data drift in machine learning
Feature selections
Variational Auto Encoder
Detect similarity between policies (i.e., distributions) in reinforcement learning
To see consistency in two measured variables of two constructs.
Techniques
Packages
entropy
philentropy
4.4.1 KullbackLeibler Divergence
Also known as relative entropy
Not a metric (does not satisfy the triangle inequality)
Can be generalized to the multivariate case

Measure the similarity between two discrete probability distributions
\(P\) = true data distribution
\(Q\) = predicted data distribution
It quantifies info loss when moving from \(P\) to \(Q\) (i.e., information loss when \(P\) is approximated by \(Q\))
Discrete
\[ D_{KL}(P Q) = \sum_i P_i \log(\frac{P_i}{Q_i}) \]
Continuous
\[ D_{KL}(PQ) = \int P(x) \log(\frac{P(x)}{Q(x)}) dx \]
where
\(K \in [0, \infty)\) from similar to diverge
Nonsymmetric between two distributions: \(D_{KL}(PQ) \neq D_{KL}(QP)\)
library(philentropy)
# philentropy::dist.diversity(rbind(X = 1:10 / sum(1:10),
# Y = 1:20 / sum(1:20)),
# p = 2,
# unit = "log2")
# continuous
KL(rbind(X = 1:10 / sum(1:10), Y = 1:10 / sum(1:10)), unit = "log2")
#> kullbackleibler
#> 0
# discrete
KL(rbind(X = 1:10, Y = 1:10), est.prob = "empirical")
#> kullbackleibler
#> 0
4.4.2 JensenShannon Divergence
 Also known as info radius or total divergence to the average
\[ D_{JS} (P Q) = \frac{1}{2}( D_{KL}(PM)+ D_{KL}(QM)) \]
where
\(M = \frac{1}{2} (P + Q)\) is a mixed distribution
\(D_{JS} \in [0,1]\) for \(\log_2\) and \(D_{JS} \in [0,\ln(2)]\) for \(\log_e\)
4.4.3 Wasserstein Distance
 measure the distance between two empirical CDFs
\[ W = \int_{x \in R}E(x)  F(X)^p \]
 This is also a test statistics
set.seed(1)
transport::wasserstein1d(rnorm(100), rnorm(100, mean = 1))
#> [1] 0.8533046
set.seed(1)
# Wasserstein metric
twosamples::wass_stat(rnorm(100), rnorm(100, mean = 1))
#> [1] 0.8533046
set.seed(1)
# permutationbased tw sample test using Wasserstein metric
twosamples::wass_test(rnorm(100), rnorm(100, mean = 1))
#> Test Stat PValue
#> 0.8533046 0.0002500
4.4.4 KolmogorovSmirnov Test
 Can be used for continuous distribution
\(H_0\): Empirical distribution follows a specified distribution
\(H_1\): Empirical distribution does not follow a specified distribution
 Using nonparametric
\[ D= \maxP(X)  Q(X) \]
 \(D \in [0,1]\) from the densities are evenly distributed to not evenly distributed
library(entropy)
library(tidyverse)
lst = list(sample_1 = c(1:20), sample_2 = c(2:30), sample_3 = c(3:30))
expand.grid(1:length(lst), 1:length(lst)) %>%
rowwise() %>%
mutate(KL = KL.empirical(lst[[Var1]], lst[[Var2]]))
#> # A tibble: 9 × 3
#> # Rowwise:
#> Var1 Var2 KL
#> <int> <int> <dbl>
#> 1 1 1 0
#> 2 2 1 0.150
#> 3 3 1 0.183
#> 4 1 2 0.704
#> 5 2 2 0
#> 6 3 2 0.0679
#> 7 1 3 0.622
#> 8 2 3 0.0870
#> 9 3 3 0
To use the test for discrete date, use bootstrap version of the KS test (bypass the continuity requirement)