1.4 Scripts and datasets
The snippets of code of the notes are conveniently collected in the following scripts. To download them, simply save the link as a file in your browser.
- Chapter 1:
01-intro.R. - Chapter 2:
02-lm-i.R. - Chapter 3:
03-lm-ii.R. - Chapter 4:
04-lm-iii.R. - Chapter 5:
05-glm.R. Generation of Figures 5.12–5.23:hypothesisGlm.R. - Chapter 6:
06-npreg.R. - Appendices A and B:
07-appendix.R.
The following is a handy list of all the relevant datasets used in the course together with brief descriptions. The list is sorted according to the order of appearance of the datasets in the notes. To download them, simply save the link as a file in your browser.
wine.csv. The dataset is formed by the auctionPriceof \(27\) red Bordeaux vintages, five vintage descriptors (WinterRain,AGST,HarvestRain,Age,Year), and the population of France in the year of the vintage (FrancePop).least-squares.RData. Contains a singledata.frame, namedleastSquares, with 50 observations of the variablesx,yLin,yQua, andyExp. These are generated as \(X\sim\mathcal{N}(0,1),\) \(Y_\mathrm{lin}=-0.5+1.5X+\varepsilon,\) \(Y_\mathrm{qua}=-0.5+1.5X^2+\varepsilon,\) and \(Y_\mathrm{exp}=-0.5+1.5\cdot2^X+\varepsilon,\) with \(\varepsilon\sim\mathcal{N}(0,0.5^2).\) The purpose of the dataset is to illustrate the least squares fitting.least-squares-3D.RData. Contains a singledata.frame, namedleastSquares3D, with \(50\) observations of the variablesx1,x2,x3,yLin,yQua, andyExp. These are generated as \(X_1,X_2\sim\mathcal{N}(0,1),\) \(X_3=X_1+\mathcal{N}(0,0.05^2),\) \(Y_\mathrm{lin}=-0.5 + 0.5 X_1 + 0.5 X_2 +\varepsilon,\) \(Y_\mathrm{qua}=-0.5 + X_1^2 + 0.5 X_2+\varepsilon,\) and \(Y_\mathrm{exp}=-0.5 + 0.5 e^{X_2} + X_3+\varepsilon,\) with \(\varepsilon\sim\mathcal{N}(0,1).\) The purpose of the dataset is to illustrate the least squares fitting with several predictors.assumptions.RData. Contains the data frameassumptionswith \(200\) observations of the variablesx1, …,x9andy1, …,y9. The purpose of the dataset is to identify which regressiony1 ~ x1, …,y9 ~ x9fulfills the assumptions of the linear model. ThemoreAssumptions.RDatadataset has the same structure.assumptions3D.RData. Contains the data frameassumptions3Dwith \(200\) observations of the variablesx1.1, …,x1.8,x2.1, …,x2.8andy.1, …,y.8. The purpose of the dataset is to identify which regressiony.1 ~ x1.1 + x2.1, …,y.8 ~ x1.8 + x2.8fulfills the assumptions of the linear model.Boston.xlsx. The dataset contains \(14\) variables describing \(506\) suburbs in Boston. Among those variables,medvis the median house value,rmis the average number of rooms per house, andcrimis the per capita crime rate. The full description is available in?MASS::Boston.cpus.txtandgpus.txt. The datasets contain \(102\) and \(35\) rows, respectively, of commercial CPUs and GPUs appeared since the first models up to nowadays. The variables in the datasets areProcessor,Transistor count,Date of introduction,Manufacturer,Process, andArea.la-liga-2015-2016.xlsx. Contains 19 performance metrics for the 20 football teams in La Liga 2015/2016.challenger.txt. Contains data for \(23\) space-shuttle launches. There are \(8\) variables. Among them:temp(the temperature in Celsius degrees at the time of launch), andfail.fieldandfail.nozzle(indicators of whether there were an incidents in the O-rings of the field joints and nozzles of the solid rocket boosters).species.txt. Contains data for \(90\) country parcels in which theBiomass,pHof the terrain (categorical variable), and number ofSpecieswere measured.heart.txt. Contains data for \(226\) patients suspected of having a future heart attack. The variables areCK(level of creatinine kinase), andhaandok(number of patients that suffered a heart attack and did not suffer it, respectively).Chile.txt. Contains data for \(2700\) respondents on a survey for the voting intentions in the 1988 Chilean national plebiscite. There are \(8\) variables:region,population,sex,age,education,income,statusquo(scale of support for the status quo), andvote.voteis a factor with levelsA(abstention),N(against Pinochet),U(undecided), andY(for Pinochet). Retrieved fromdata(Chile, package = "carData").