Chapter 2 Example data to follow along with

You can also download the example datasets and place them in your own PostgreSQL database.

Detroit Dataset: 14 Columns with 13 Observations

This is the data set called DETROIT’ in the bookSubset selection in regression’ by Alan J. Miller published in the Chapman & Hall series of monographs on Statistics & Applied Probability, no. 40. The data are unusual in that a subset of three predictors can be found which gives a very much better fit to the data than the subsets found from the Efroymson stepwise algorithm, or from forward selection or backward elimination. The original data were given in appendix A of `Regression analysis and its application: A data-oriented approach’ by Gunst & Mason, Statistics textbooks and monographs no. 24, Marcel Dekker. It has caused problems because some copies of the Gunst & Mason book do not contain all of the data, and because Miller does not say which variables he used as predictors and which is the dependent variable. (HOM was the dependent variable, and the predictors were FTP … WE)


Create Detroit Table in PostgreSQL

The necessary files to create the Detroit table in PostgreSQL can be found in the following link

Download Detroit data

Student Performance Data Set: A data frame with 392 rows and 33 variables:

This data approach looks at student achievement in secondary education of two Portuguese schools. The data attributes include student grades, demographics, social and school related features and it was collected utilizing school reports and questionnaires. Two datasets are provided regarding the performance in two distinct subjects: Mathematics (mat) and Portuguese language (por). In [Cortez and Silva, 2008], the two datasets were modeled under binary/five-level classification and regression tasks. Important note: the target attribute G3 has a strong correlation with attributes G2 and G1. This occurs because G3 is the final year grade (issued at the 3rd period), While G1 and G2 correspond to the 1st and 2nd period grades. It is more difficult to predict G3 without G2 and G1, but such prediction is much more useful (see paper source for more details)


Dodgers Baseball Data Set: A data frame with 81 rows and 13 variables:

Create Dodgers Table in PostgreSQL

The necessary files to create the Dodgers table in PostgreSQL can be found in the following link

Download Dodgers data