1.5 Datasets for the course
This is a handy list with a small description and download link for all the relevant datasets used in the course. To download them, simply save the link as a file in your browser.
pisa.csv(download). Contains 65 rows corresponding to the countries that took part on the PISA study. Each row has the variablesCountry,MeanMath,MathShareLow,MathShareTop,ReadingMean,ScienceMean,GDPp,logGDPpandHighIncome. ThelogGDPpis the logarithm of theGDPp, which is taken in order to avoid scale distortions.US_apportionment.xlsx(download). Contains the 50 US states entitled to representation in the US House of Representatives. The recorded variables areState,Population2010andSeats2013–2023.EU_apportionment.txt(download). Contains 28 rows with the member states for the EU (Country), the number of seats assigned under different years (Seats2011,Seats2014), the Cambridge Compromise apportionment (CamCom2011) and the states population (Population2010,Population2013).least-squares.RData(download). Contains a singledata.frame, namedleastSquares, with 50 observations of the variablesx,yLin,yQuaandyExp. These are generated as \(X\sim\mathcal{N}(0,1)\), \(Y_\mathrm{lin}=-0.5+1.5X+\varepsilon\), \(Y_\mathrm{qua}=-0.5+1.5X^2+\varepsilon\) and \(Y_\mathrm{exp}=-0.5+1.5\cdot2^X+\varepsilon\), with \(\varepsilon\sim\mathcal{N}(0,0.5^2)\). The purpose of the dataset is to illustrate the least squares fitting.assumptions.RData(download). Contains the data frameassumptionswith 200 observations of the variablesx1, …,x9andy1, …,y9. The purpose of the dataset is to identify which regressiony1 ~ x1, …,y9 ~ x9fulfills the assumptions of the linear model. The datasetmoreAssumptions.RData(download) has the same structure.cpus.txt(download) andgpus.txt(download). The datasets contain 102 and 35 rows, respectively, of commercial CPUs and GPUs appeared since the first models up to nowadays. The variables in the datasets areProcessor,Transistor count,Date of introduction,Manufacturer,ProcessandArea.hap.txt(download). Contains data for 20 advanced economies in the time period 1946–2009, measured for 31 variables. Among those, the variabledRGDPrepresents the real GDP growth (as a percentage) anddebtgdprepresents the percentage of public debt with respect to the GDP.wine.csv(download). The dataset is formed by the auctionPriceof 27 red Bordeaux vintages, five vintage descriptors (WinterRain,AGST,HarvestRain,Age,Year) and the population of France in the year of the vintage (FrancePop).Boston.xlsx(download). The dataset contains 14 variables describing 506 suburbs in Boston. Among those variables,medvis the median house value,rmis the average number of rooms per house andcrimis the per capita crime rate. The full description is available in?Boston.assumptions3D.RData(download). Contains the data frameassumptions3Dwith 200 observations of the variablesx1.1, …,x1.8,x2.1, …,x2.8andy.1, …,y.8. The purpose of the dataset is to identify which regressiony.1 ~ x1.1 + x2.1, …,y.8 ~ x1.8 + x2.8fulfills the assumptions of the linear model.challenger.txt(download). Contains data for 23 Space-Shuttle launches. The data consists of 23 shuttle flights. There are 8 variables. Among them:temp, the temperature in Celsius degrees at the time of launch, andfail.fieldandfail.nozzle, indicators of whether there were an incidents in the O-rings of the field joints and nozzles of the solid rocket boosters.eurojob.txt(download). Contains data for employment in 26 European countries. There are 9 variables, giving the percentage of employments in 9 sectors:Agr(Agriculture),Min(Mining),Man(Manufacture),Pow(Power),Con(Construction),Ser(Services),Fin(Finance),Soc(Social) andTra(Transport).Chile.txt(download). Contains data for 2700 respondents on a survey for the voting intentions in the 1988 Chilean national plebiscite. There are 8 variables:region,population,sex,age,education,income,statusquo(scale of support for the status quo) andvote.voteis a factor with levelsA(abstention),N(against Pinochet),U(undecided),Y(for Pinochet). Available in R through the packagecaranddata(Chile).USArrests.txt(download). Arrest statistics forAssault,MurderandRapein each of the 50 US states in 1973. The percent of the population living in urban areas,UrbanPop, is also given. Available in R throughdata(USArrests).USJudgeRatings.txt(download). Lawyers’ ratings of state judges in the US Superior Court. The dataset contains 43 observations of 12 variables measuring the performance of the judge when conducting a trial. Available in R throughdata(USJudgeRatings).la-liga-2015-2016.xlsx(download). Contains 19 performance metrics for the 20 football teams in La Liga 2015/2016.pisaUS2009.csv(download). Reading score of 3663 US students in the PISA test, with 23 variables informing about the student profile and family background.