Data Analytics Module
Lecturer: Hans van der Zwan
Lab 01
Topic: graphical data analysis/ visualization


EXERCISE 1.1 Continuous variable

Open the file HP_LONDON_JAN19.xlsx, it can be downloaded from blackboard or it can be retrieved from http://landregistry.data.gov.uk/app/ppd.

  1. Explore the PRICE variable using histograms.
  2. Create a publishable histogram with bins width equals 50,000 GBP, the underflow bin containing the properties with a selling price less than 50,000 GPB and the overflow bin containing properties with a selling price higher than 1,500,000 GBP.
  3. Compare the distribution of the house prices for the different property types using boxplots.


EXERCISE 1.2 Categorical variable

This exercise uses the same data file as exercise 1 (HP_LONDON_19.xlsx).
Use MS Excel to create the graphs in figure 1 en figure 2 in the handout.

EXERCISE 1.3: categorical variable

This exercise uses the same data file as exercise 1 (HP_LONDON_19.xlsx).
Make a new variable ‘WEEKDAY’ (values: SUN, MON and so on).

  1. Create a barplot with the numbers of properties sold for each of the weekdays in January 2019.
  2. Why is the graph created in part (i) misleading?
  3. Create a graph which gives a correct distribution of the numbers of properties sold om Sunday, Monday etc.



EXERCISE 1.4 Bivariate analysis a categorical and a numerical variable

Navigate to KNMIdata.
Select period from 2017-01-01 to 2018-12-31.
Select all variables.
Select station 344 Rotterdam.
Download the file, it is a .txt-file. This file can be opened in MS Excel.
Copy the metadata to a new sheet ‘metadata’. Copy the data to a new sheet ‘data’.

  1. Explore the T variable (temperature) using histograms.
  2. Explore the T variable using boxplots for the different hours of the day.

Create a new variable: SEASON. This variable should have four values: SPRING for observations in March, April and May, SUMMER (June, July, August), AUTUMN (September, October, November) and WINTER (December, January, February).

  1. Explore the T variable using boxplots for the different seasons.


EXERCISE 1.5 Bivariate analysis: two categorical variables

Navigate to DUO website.
Choose: Databestanden/Hoger Onderwijs/Ingeschreven/hbo.
Dowload the file 01.Ingeschrevenen hbo 2018.xls.

  1. Create a table with number of male and female students per master program (use variable CROHO ONDERDEEL) at Dutch Universities of Applied Sciences in 2018.
  2. Create barplots with number of male and female students per master program at Dutch Universities of Applied Sciences in 2018.


EXERCISE 1.6 Bivariate analysis: two numerical variables

The data used to plot figure 5 in the handout can be found in the file RoomsForRentNeth.xlsx.

  1. Generate a scatterplot for the data from Amsterdam.
  2. Is there a relationship between AREA and RENT for the rooms for rent in Amsterdam?
  3. Same question for Rotterdam and The Hague.


EXERCISE 1.7 Time diagram

Use the data in the file cbs_airport_figures.xlsx to create Figure 10 from the handout.