## 1.4 Scripts and datasets

The snippets of code of the notes are conveniently collected in the following scripts. To download them, simply **save the link as a file** in your browser.

- Chapter 1:
`01-intro.R`

. - Chapter 2:
`02-lm-i.R`

. - Chapter 3:
`03-lm-ii.R`

. - Chapter 4:
`04-lm-iii.R`

. - Chapter 5:
`05-glm.R`

. Generation of Figures 5.12–5.23:`hypothesisGlm.R`

. - Chapter 6:
`06-npreg.R`

. - Appendices A and B:
`07-appendix.R`

.

The following is a handy list of all the relevant datasets used in the course together with brief descriptions. The list is sorted according to the order of appearance of the datasets in the notes. To download them, simply save the link as a file in your browser.

`wine.csv`

. The dataset is formed by the auction`Price`

of \(27\) red Bordeaux vintages, five vintage descriptors (`WinterRain`

,`AGST`

,`HarvestRain`

,`Age`

,`Year`

), and the population of France in the year of the vintage (`FrancePop`

).`least-squares.RData`

. Contains a single`data.frame`

, named`leastSquares`

, with 50 observations of the variables`x`

,`yLin`

,`yQua`

, and`yExp`

. These are generated as \(X\sim\mathcal{N}(0,1),\) \(Y_\mathrm{lin}=-0.5+1.5X+\varepsilon,\) \(Y_\mathrm{qua}=-0.5+1.5X^2+\varepsilon,\) and \(Y_\mathrm{exp}=-0.5+1.5\cdot2^X+\varepsilon,\) with \(\varepsilon\sim\mathcal{N}(0,0.5^2).\) The purpose of the dataset is to illustrate the least squares fitting.`least-squares-3D.RData`

. Contains a single`data.frame`

, named`leastSquares3D`

, with \(50\) observations of the variables`x1`

,`x2`

,`x3`

,`yLin`

,`yQua`

, and`yExp`

. These are generated as \(X_1,X_2\sim\mathcal{N}(0,1),\) \(X_3=X_1+\mathcal{N}(0,0.05^2),\) \(Y_\mathrm{lin}=-0.5 + 0.5 X_1 + 0.5 X_2 +\varepsilon,\) \(Y_\mathrm{qua}=-0.5 + X_1^2 + 0.5 X_2+\varepsilon,\) and \(Y_\mathrm{exp}=-0.5 + 0.5 e^{X_2} + X_3+\varepsilon,\) with \(\varepsilon\sim\mathcal{N}(0,1).\) The purpose of the dataset is to illustrate the least squares fitting with several predictors.`assumptions.RData`

. Contains the data frame`assumptions`

with \(200\) observations of the variables`x1`

, …,`x9`

and`y1`

, …,`y9`

. The purpose of the dataset is to identify which regression`y1 ~ x1`

, …,`y9 ~ x9`

fulfills the assumptions of the linear model. The`moreAssumptions.RData`

dataset has the same structure.`assumptions3D.RData`

. Contains the data frame`assumptions3D`

with \(200\) observations of the variables`x1.1`

, …,`x1.8`

,`x2.1`

, …,`x2.8`

and`y.1`

, …,`y.8`

. The purpose of the dataset is to identify which regression`y.1 ~ x1.1 + x2.1`

, …,`y.8 ~ x1.8 + x2.8`

fulfills the assumptions of the linear model.`Boston.xlsx`

. The dataset contains \(14\) variables describing \(506\) suburbs in Boston. Among those variables,`medv`

is the median house value,`rm`

is the average number of rooms per house, and`crim`

is the per capita crime rate. The full description is available in`?MASS::Boston`

.`cpus.txt`

and`gpus.txt`

. The datasets contain \(102\) and \(35\) rows, respectively, of commercial CPUs and GPUs appeared since the first models up to nowadays. The variables in the datasets are`Processor`

,`Transistor count`

,`Date of introduction`

,`Manufacturer`

,`Process`

, and`Area`

.`la-liga-2015-2016.xlsx`

. Contains 19 performance metrics for the 20 football teams in La Liga 2015/2016.`challenger.txt`

. Contains data for \(23\) space-shuttle launches. There are \(8\) variables. Among them:`temp`

(the temperature in Celsius degrees at the time of launch), and`fail.field`

and`fail.nozzle`

(indicators of whether there were an incidents in the O-rings of the field joints and nozzles of the solid rocket boosters).`species.txt`

. Contains data for \(90\) country parcels in which the`Biomass`

,`pH`

of the terrain (categorical variable), and number of`Species`

were measured.`heart.txt`

. Contains data for \(226\) patients suspected of having a future heart attack. The variables are`CK`

(level of creatinine kinase), and`ha`

and`ok`

(number of patients that suffered a heart attack and did not suffer it, respectively).`Chile.txt`

. Contains data for \(2700\) respondents on a survey for the voting intentions in the 1988 Chilean national plebiscite. There are \(8\) variables:`region`

,`population`

,`sex`

,`age`

,`education`

,`income`

,`statusquo`

(scale of support for the status quo), and`vote`

.`vote`

is a factor with levels`A`

(abstention),`N`

(against Pinochet),`U`

(undecided), and`Y`

(for Pinochet). Retrieved from`data(Chile, package = "carData")`

.