Statistical modelling - Lecture Notes for From Data to Evidence 3
Preface
1
Statistical modelling
1.1
Statistical models
1.2
linear models
1.3
Definitions
1.4
Categorical independent variables
1.5
More than two classes of catagorical variables
1.6
Interpreting parameters
1.7
More complex models
1.8
What makes a linear model “linear”?
1.9
General linear models
1.10
Assumptions of General Linear Models
1.10.1
Linearity
1.10.2
Normality
1.10.3
Constant variance
1.10.4
Independence
1.11
What you should know
2
Fitting models to data
2.1
Recap
2.2
Fitting models to data
2.3
Discrete probability distributions
2.4
Probability density functions
2.5
Joint probability
2.6
Maximum Likelihood Estimation
2.6.1
Ordinary Least Squares and Maximum Likelihood give the same answers when errors are normally distributed
2.6.2
Take-home messages about the relationship between OLS and MLE
2.7
Optimization (how do we find the best parameters)
2.7.1
Non-linear optimization
2.8
Other important metrics of goodness of fit
2.8.1
Residual standard error
2.8.2
R-squared
2.9
Introduction to transformations
2.9.1
Converting back to linear scale
2.9.2
Summary of transformations
2.10
What you should know
3
Uncertainty
3.1
Recap
3.2
The sampling distribution and the central value theorem.
3.3
How does uncertainty in parameter estimates change with sample size?
3.4
Calculating standard errors from a sample
3.4.1
Review of probability density functions and probabilities
3.5
Confidence intervals
3.6
The 2 SE method for estimating 95% CI
3.7
The
t
-distribution
3.8
Simulating data
3.9
What you need to know
4
Hypothesis testing
4.1
Recap
4.2
The motivation behind p-values
4.3
Convensional use of NHST
4.4
P-values and NHST: an example
4.5
One-tailed and two-tailed tests
4.6
The relationship between confidence intervals and t-tests
4.7
False positives and negatives
4.7.1
False positive rate when null hypothsis is true
4.7.2
False negatives and power analyses
4.8
Multiple tests
4.9
Common pitfals with the use of NHST
4.9.1
Statistical significance does mean biological or practical significance
4.9.2
The null hypothesis is almost always false
4.9.3
Not staistically significant does not mean “no effect”
4.9.4
p-values do not tell you the probability your hypothesis is true
4.9.5
Arbitrary cutoffs
4.10
My advice
4.11
What you should know
5
More complex models
5.1
More than one independent variable
5.2
Interactions
5.3
Controlling for a variable
5.4
Multicollinearity
5.5
What you should know
6
Comparing models
6.1
Model complexity and overfitting
6.2
Cross-validation
6.3
AIC
6.4
Advice for model selection
6.4.1
Analysis of experiments
6.4.2
Exploratory data analysis
6.4.3
Inference: Testing hypotheses with observational datasets
6.4.4
Robustness checks
6.4.5
Null hypothesis tests with large observation datasets are meaningless
6.4.6
Prediction
References
Statistical modelling
References