2 Day 2 (June 4)
2.1 Announcements
- Assignment 1 is due on Friday.
- Upload to Canvas
- Assignment should only take 15-30 min if everything works
- Do not spend more than 1 hour
- After 1 hour of trying please visit me during office hours
- Do not wait until thursday to do the assignment
- Recommended reading
- Chapters 1 and 2 (pgs 1 - 28) in Linear Models with R
- Chapter 2 in Applied Regression and ANOVA Using SAS
- Today’s lecture in the news
- Please realize this is a satirical article (link)
2.2 Intro to statistical modelling
- A difficult question
- How much money will I have for retirement?
- What is data?
- Something in the real world that you can, in some way, observe and measure with or without error
- What is a statistic?
- A function of the data
- What is a model?
- Mathematical models
- Statistical models
- A difficult question
- How much money will I have for retirement?
- Point prediction vs. distributional prediction
- What data/information do I have?
- What data do I need?
- How can I answer this question using a statistical model?
- How much money will I have for retirement?
- Example: my retirement
- Personal information
- Obviously this isn’t my actual information, but it isn’t too far off!
- Since I am a millennial I don’t think social security will be around when I retire (i.e., assume social security contributes $0 to my retirement)
- As of 1/1/24 I have $350,000 in a 401k style retirement account
- All of money is invested into an S&P 500 index fund (VOO to be exact)
- I am 37 as of 1/1/24
- I want to know how much pre-tax money I will have at a given retirement age (e.g., 65, 70, etc)
- Example using a mathematical model
- Whiteboard demonstration
- What are the model assumptions?
- In program R
- Personal information
# The value of my 401k retirement account as of 1/1/24
y_2024 <- 350000
# How much money will I add to my 401k each year
q <- 46000
# Rate of return for S&P 500 index fund
r <- 0.12
# How much $ will I have in 2025
y_2025 <- y_2024*(1+r)+q
y_2025
## [1] 438000
## [1] 536560
## [1] 646947.2
# Using a for loop to calculate how much $ will I have
year <- seq(2024,2025+30,by=1)
y <- matrix(,length(year),1)
rownames(y) <- year
y[1,1] <- 350000
for(t in 1:30){
y[t+1,1] <- y[t,1]*(1+r)+q
}
plot(year,y/10^6,typ="b",pch=20,col="deepskyblue",xlab="Year",ylab="Pretax retirement amount ($ millions)")
# How much $ will I have when I am 67?
# Note that units are millions of $
retirement.year <- 2024+30
y[which(year==retirement.year)]/10^6
## [1] 21.58728
- Example using a Bayesian statistical model
- S&P 500 return since inception in 1957
# Download S&P 500 returns
url <- "https://www.dropbox.com/s/81ccahyuaas1zpd/s%26p500.csv?dl=1"
df.sp500 <- read.csv(url)
head(df.sp500)
## year return
## 1 2022 -0.1811
## 2 2021 0.2871
## 3 2020 0.1840
## 4 2019 0.3149
## 5 2018 -0.0438
## 6 2017 0.2183
## [1] 0.1152742
- Example using the prior predictive distribution
# Download S&P 500 returns
url <- "https://www.dropbox.com/s/81ccahyuaas1zpd/s%26p500.csv?dl=1"
df.sp500 <- read.csv(url)
# The value of my 401k retirement account as of 1/1/24
y_2024 <- 350000
# How much money will I add to my 401k each year
q <- 46000
# Using a for loop to calculate how much $ will I have
year <- seq(2024,2024+30,by=1)
Y <- matrix(,length(year),1000)
rownames(Y) <- year
Y[1,] <- y_2024
set.seed(3410)
for(m in 1:1000){
for(t in 1:30){
r <- sample(df.sp500$return,1)
Y[t+1,m] <- Y[t,m]*(1+r)+q
}
}
# Prior predictive distribution for a given year
retirement.year <- 2024+30
hist(Y[which(year==retirement.year),]/10^6,col="grey",freq=FALSE,xlab="Pretax retirement amount ($ millions)",main="")
## [1] 18.5674
## [1] 143.2108
## [1] 1.082085
## 2.5% 97.5%
## 3.20066 60.92273
# Plot some financial trajectories (i.e., random draws from the prior predictive distribution)
plot(year,Y[,1]/10^6,typ="l",lwd=1,ylim=c(0,max(Y/10^6)),col=rgb(0.1,0.1,0.1,.2),xlab="Year",ylab="Pretax retirement amount ($ millions)")
for(i in 1:30){
points(year,Y[,i]/10^6,typ="l",lwd=1,col=rgb(0.1,0.1,0.1,.2))
}
# Plot of prior predictive distribution for all years
E.Y <- apply(Y/10^6,1,mean)
u.CI <- apply(Y/10^6,1,quantile,prob=0.975)
l.CI <- apply(Y/10^6,1,quantile,prob=0.025)
max.Y <- apply(Y/10^6,1,max)
min.Y <- apply(Y/10^6,1,min)
plot(year,E.Y,typ="l",lwd=3,ylim=c(0,max(max.Y)),xlab="Year",ylab="Pretax retirement amount ($ millions)")
points(year,max.Y,typ="l",lwd=3,col="red")
points(year,min.Y,typ="l",lwd=3,col="red")
polygon(c(year,rev(year)),c(u.CI,rev(l.CI)),
col=rgb(0.5,0.5,0.5,0.5),border=rgb(0.5,0.5,0.5,0.5))
- Further reading/learning