Data Analytics Module
Lecturer: Hans van der Zwan
Lab 05
Topic: multiple regression
EXERCISE 5.1
(continuation of the homework assignment from handout 4)
The file 20191126_forsale_amsterdam.csv contains information about properties for sale in Amsterdam on November 26, 2019.
- Create a table with summary statistics for the house prices per PC3 district. A PC3 district is a district with the same thirst three postcode characters; so the PC3 districts in Amsterdam are: PC100, PC101, …, PC110.
- Create a scatterplot with PRICE as Y-variable and AREA as X-variable.
- Generate a regression model with PRICE as response and AREA as explanatory variable.
- Asses the regression model from part (iii).
- Create a scatterplot with PRICE as Y-variable and ROOMS as X-variable.
- Generate a regression model with PRICE as response variable and ROOMS as explanatory variable.
- Asses the regression model from part (vi).
- Create a regression model with PRICE as response variable and AREA and ROOMS as explanatory variables.
- Asses the regression model from part (viii) and compare it with the two other models.
- Add a dummy variable: 1 = located in the city centre, 0 = not located in the city center. Add this variable to the regreesion model of part (viii).
EXERCISE 5.2
Topic: determinants of healthcare costs in the Netherlands
Steps (methodology):
- Collect historica data:
- Healthcare costs from 2017 per municipality (vektis.nl)
- Figures from (socio-economic) factors in which are assumed to be related with healthcare costs
- Generate a multiple regression model with healthcare costs in 2017 as resonse variable and the (socio-economic) factors as predictors
Open the file healthcare_nl.csv. This file contains information about a sample of Dutch Municipalities in 2017.
Variables in this dataset:
- MUNCODE; a unique identifier for Dutch municapalities
- MUNICIPALITY; the name of the Municipality
- TOTAL_COSTS; total healthcare costs insured under the basic Dutch health insurance
- INSURED_YEARS; total number of insured years
- UNEMPLOYMENT_RATE
- DISTANCE_HOSPITAL; average distance to hospital
- HOSPITALS; average number of hospitals within a distance of 20 km
Develop a multiple regression model with HEALTHCARE_COSTS as response variable and assess the model.