Data Analytics Module
Lecturer: Hans van der Zwan
Lab 01
Topic: graphical data analysis/ visualization
EXERCISE 1.1 Continuous variable
Open the file HP_LONDON_JAN19.xlsx, it can be downloaded from blackboard or it can be retrieved from http://landregistry.data.gov.uk/app/ppd.
- Explore the PRICE variable using histograms.
- Create a publishable histogram with bins width equals 50,000 GBP, the underflow bin containing the properties with a selling price less than 50,000 GPB and the overflow bin containing properties with a selling price higher than 1,500,000 GBP.
- Compare the distribution of the house prices for the different property types using boxplots.
EXERCISE 1.2 Categorical variable
This exercise uses the same data file as exercise 1 (HP_LONDON_19.xlsx).
Use MS Excel to create the graphs in figure 1 en figure 2 in the handout.
EXERCISE 1.3: categorical variable
This exercise uses the same data file as exercise 1 (HP_LONDON_19.xlsx).
Make a new variable ‘WEEKDAY’ (values: SUN, MON and so on).
- Create a barplot with the numbers of properties sold for each of the weekdays in January 2019.
- Why is the graph created in part (i) misleading?
- Create a graph which gives a correct distribution of the numbers of properties sold om Sunday, Monday etc.
EXERCISE 1.4 Bivariate analysis a categorical and a numerical variable
Navigate to KNMIdata.
Select period from 2017-01-01 to 2018-12-31.
Select all variables.
Select station 344 Rotterdam.
Download the file, it is a .txt-file. This file can be opened in MS Excel.
Copy the metadata to a new sheet ‘metadata’. Copy the data to a new sheet ‘data’.
- Explore the T variable (temperature) using histograms.
- Explore the T variable using boxplots for the different hours of the day.
Create a new variable: SEASON. This variable should have four values: SPRING for observations in March, April and May, SUMMER (June, July, August), AUTUMN (September, October, November) and WINTER (December, January, February).
- Explore the T variable using boxplots for the different seasons.
EXERCISE 1.5 Bivariate analysis: two categorical variables
Navigate to DUO website.
Choose: Databestanden/Hoger Onderwijs/Ingeschreven/hbo.
Dowload the file 01.Ingeschrevenen hbo 2018.xls.
- Create a table with number of male and female students per master program (use variable CROHO ONDERDEEL) at Dutch Universities of Applied Sciences in 2018.
- Create barplots with number of male and female students per master program at Dutch Universities of Applied Sciences in 2018.
EXERCISE 1.6 Bivariate analysis: two numerical variables
The data used to plot figure 5 in the handout can be found in the file RoomsForRentNeth.xlsx.
- Generate a scatterplot for the data from Amsterdam.
- Is there a relationship between AREA and RENT for the rooms for rent in Amsterdam?
- Same question for Rotterdam and The Hague.