1.4 Scripts and datasets
The snippets of code of the notes are conveniently collected in the following scripts. To download them, simply save the link as a file in your browser.
- Chapter 1:
01-intro.R
. - Chapter 2:
02-lm-i.R
. - Chapter 3:
03-lm-ii.R
. - Chapter 4:
04-lm-iii.R
. - Chapter 5:
05-glm.R
. Generation of Figures 5.12–5.23:hypothesisGlm.R
. - Chapter 6:
06-npreg.R
. - Appendices A and B:
07-appendix.R
.
The following is a handy list of all the relevant datasets used in the course together with brief descriptions. The list is sorted according to the order of appearance of the datasets in the notes. To download them, simply save the link as a file in your browser.
wine.csv
. The dataset is formed by the auctionPrice
of \(27\) red Bordeaux vintages, five vintage descriptors (WinterRain
,AGST
,HarvestRain
,Age
,Year
), and the population of France in the year of the vintage (FrancePop
).least-squares.RData
. Contains a singledata.frame
, namedleastSquares
, with 50 observations of the variablesx
,yLin
,yQua
, andyExp
. These are generated as \(X\sim\mathcal{N}(0,1),\) \(Y_\mathrm{lin}=-0.5+1.5X+\varepsilon,\) \(Y_\mathrm{qua}=-0.5+1.5X^2+\varepsilon,\) and \(Y_\mathrm{exp}=-0.5+1.5\cdot2^X+\varepsilon,\) with \(\varepsilon\sim\mathcal{N}(0,0.5^2).\) The purpose of the dataset is to illustrate the least squares fitting.least-squares-3D.RData
. Contains a singledata.frame
, namedleastSquares3D
, with \(50\) observations of the variablesx1
,x2
,x3
,yLin
,yQua
, andyExp
. These are generated as \(X_1,X_2\sim\mathcal{N}(0,1),\) \(X_3=X_1+\mathcal{N}(0,0.05^2),\) \(Y_\mathrm{lin}=-0.5 + 0.5 X_1 + 0.5 X_2 +\varepsilon,\) \(Y_\mathrm{qua}=-0.5 + X_1^2 + 0.5 X_2+\varepsilon,\) and \(Y_\mathrm{exp}=-0.5 + 0.5 e^{X_2} + X_3+\varepsilon,\) with \(\varepsilon\sim\mathcal{N}(0,1).\) The purpose of the dataset is to illustrate the least squares fitting with several predictors.assumptions.RData
. Contains the data frameassumptions
with \(200\) observations of the variablesx1
, …,x9
andy1
, …,y9
. The purpose of the dataset is to identify which regressiony1 ~ x1
, …,y9 ~ x9
fulfills the assumptions of the linear model. ThemoreAssumptions.RData
dataset has the same structure.assumptions3D.RData
. Contains the data frameassumptions3D
with \(200\) observations of the variablesx1.1
, …,x1.8
,x2.1
, …,x2.8
andy.1
, …,y.8
. The purpose of the dataset is to identify which regressiony.1 ~ x1.1 + x2.1
, …,y.8 ~ x1.8 + x2.8
fulfills the assumptions of the linear model.Boston.xlsx
. The dataset contains \(14\) variables describing \(506\) suburbs in Boston. Among those variables,medv
is the median house value,rm
is the average number of rooms per house, andcrim
is the per capita crime rate. The full description is available in?MASS::Boston
.cpus.txt
andgpus.txt
. The datasets contain \(102\) and \(35\) rows, respectively, of commercial CPUs and GPUs appeared since the first models up to nowadays. The variables in the datasets areProcessor
,Transistor count
,Date of introduction
,Manufacturer
,Process
, andArea
.la-liga-2015-2016.xlsx
. Contains 19 performance metrics for the 20 football teams in La Liga 2015/2016.challenger.txt
. Contains data for \(23\) space-shuttle launches. There are \(8\) variables. Among them:temp
(the temperature in Celsius degrees at the time of launch), andfail.field
andfail.nozzle
(indicators of whether there were an incidents in the O-rings of the field joints and nozzles of the solid rocket boosters).species.txt
. Contains data for \(90\) country parcels in which theBiomass
,pH
of the terrain (categorical variable), and number ofSpecies
were measured.heart.txt
. Contains data for \(226\) patients suspected of having a future heart attack. The variables areCK
(level of creatinine kinase), andha
andok
(number of patients that suffered a heart attack and did not suffer it, respectively).Chile.txt
. Contains data for \(2700\) respondents on a survey for the voting intentions in the 1988 Chilean national plebiscite. There are \(8\) variables:region
,population
,sex
,age
,education
,income
,statusquo
(scale of support for the status quo), andvote
.vote
is a factor with levelsA
(abstention),N
(against Pinochet),U
(undecided), andY
(for Pinochet). Retrieved fromdata(Chile, package = "carData")
.