Introduction to Probability and Statistics
Preface
I Data and Models
1
Data Basics
1.1
R and RStudio essentials
1.1.1
The basics
1.1.2
Calling functions
1.1.3
The pipe operator
1.1.4
Working with R Script Files
1.1.5
Working with R Markdown
1.1.6
Importing a dataset into R
1.1.7
R packages
1.1.8
Errors, warnings, and messages
1.2
How are data organized?
1.3
Variables
1.4
Models
1.5
Types of data-generating studies
2
Summarizing data
2.1
Data Wrangling
2.1.1
filter()
2.1.2
mutate()
2.1.3
group_by() and summarize()
2.2
One quantitative variable
2.2.1
Measures of location
2.2.2
Measures of dispersion
2.2.3
Histograms
2.2.4
Density plots
2.2.5
Boxplots
2.2.6
Shapes of distributions
2.3
One categorical variable
2.3.1
Barplots
2.4
Two quantitative variables
2.4.1
Scatterplots
2.4.2
Correlations
2.5
Two categorical variables
2.5.1
Contingency tables
2.5.2
Stacked barplots
2.5.3
Mosaic plots
2.6
One quantitative and one categorical variable
2.6.1
Side-by-side boxplots
2.6.2
Side-by-side density plots
2.7
A word on statistical inference
2.7.1
Malaria vaccine example
2.7.2
Simulating the study
3
Simple linear regression
3.1
Least squares model
3.2
Categorical predictor
3.2.1
Categorical predictor with two levels
3.2.2
Categorical predictor with three or more levels
3.3
Correlations
3.3.1
Pearson’s correlation coefficient
3.3.2
Spearman’s correlation coefficient
4
Multiple linear regression
4.1
Multiple LS model
4.2
Categorical predictors
II Probability and Random Variables
5
Probability
5.1
Definitions, laws, and examples
5.2
Independence
5.3
Conditional probability
5.3.1
Definitions
5.3.2
General multiplication rule
5.3.3
Bayes Theorem
6
Distributions of random variables
6.1
PMFs, PDFs, and CDFs
6.2
Binomial distribution
6.3
Normal distribution
7
Properties of random variables
7.1
Expected value
7.2
Variance
7.3
Covariance and correlation
III Statistical Inference
8
Sampling distributions
8.1
Population and sample
8.2
Estimators
8.3
Central Limit Theorem
8.4
t distribution
8.5
Simulation techniques
8.5.1
Bootstrap
9
Inference for proportions
9.1
Confidence interval for
\(p\)
9.2
Hypothesis testing for
\(p\)
9.3
Chi-square goodness-of-fit test
9.3.1
An example
9.3.2
Chi-square distribution
9.3.3
Conducting a goodness-of-fit test
9.4
Confidence interval for
\(p_1 - p_2\)
9.5
Hypothesis test for
\(p_1 - p_2\)
9.6
Chi-square test of independence
10
Inference for means
10.1
Confidence interval for
\(\mu\)
10.2
Hypothesis test for
\(\mu\)
10.3
Confidence interval for
\(\mu_1 - \mu_2\)
10.4
Hypothesis testing for
\(\mu_1 - \mu_2\)
10.5
A few remarks about hypothesis tests
10.5.1
Hypothesis tests for other parameters
10.5.2
Statistical significance versus practical significance
11
Inference for regression
11.1
CLT for regression coefficients
11.1.1
Prelude
11.1.2
Sampling distribution of regression coefficients
11.2
Regression diagnostics
11.3
Confidence interval for
\(b\)
11.4
Hypotheses testing for
\(b\)
11.5
Analysis of variance (ANOVA)
11.5.1
ANOVA for one predictor
11.5.2
Coeficient of determination
11.5.3
ANOVA for more than one predictor
11.5.4
ANOVA for categorical predictors
11.5.5
Adjusted R-squared
11.6
Model selection
11.6.1
Backward elimination
11.6.2
Forward selection
References
Published with bookdown
MA217 - Introduction to Probability and Statistics
References
Krzanowski, W. J., and F. H. C. Marriott. 1994.
Multivariate Analysis Part 1: Distributions, Ordination, and Inference
. Hodder Education Publishers.