Data Analytics Module
Lecturer: Hans van der Zwan
Lab 02
Topic: summary statistics


EXERCISE 2.1

Use the HP_LONDON_JAN19.xlsx file to generate Table 1.1

Table 1
Summary statistics properties sold in London in January 2019

TYPE

COUNT

AVERAGE

MEDIAN

SD

PERC_ABOVE_1mln

FLAT

782

592,700

445,000

612,700

9.1

DETACHED

37

1,498,800

1,250,000

873,100

62.2

SEMI-DETACHED

125

982,200

805,000

642,400

32.8

TERRACED

442

876,100

630,000

919,700

23.5

Note. Average, median and standard deviation (SD) are in GB Pounds. The last column gives the percentages of properties sold with a selling price above 1 mln GPB. Underlying data are retrieved from http://landregistry.data.gov.uk/app/ppd/.


EXERCISE 2.2

Gather the GDP per capita data per country from the World Bank open data website. Copy the data for the year 2017 to a new worksheet.
Gather the life expectancy data from the same website. Copy the data for the year 2017 to a new worksheet.
Gather the population data from the same website. Copy the data for the year 2017 in a new worksheet.
Join the GDP per capita data for 2017 with the life expectancy data and with the population data.

  1. Create a bubble chart of these data for 2017, with the GDP data mapped on the x-axis, the life expectancy data on the y-axis and the population data mapped on the bubble size.
  2. Create a table with summary statistics for the GDP per capita in the different continents (for this the continent has to be added as a variable; the continent for each country can be found in the worksheet Metadata-country).
  3. Same as (ii) for the life-expectancy.


EXERCISE 2.3

Navigate to luchtmeetnet.nl.
Dowload PM10 level data for 2018 for one of the four locations: (1) Amsterdam-Stadhouderskade, (2) Rotterdam-Schiedamsevest, (3) Den Haag-Amsterdamse Veerkade and (4) Utrecht-Kardinaal de Jong weg.
Make a copy of the downloaded data set.

  1. Check if the dataset contains negative PM10 values. If such is the case, remove them.
  2. Explore the measured PM10 levels using graphical analysis. Detect outliers and if there are any analyze them.
  3. Summarize the data per day using the average PM10 level per day. Plot a time series graph of these averages.
  4. Create a boxplot in which the measured PM10 levels on Sunday, Monday, etc are compared.
  5. Create a table with summary statistics of the PM10 levels per weekday.
  6. Create a boxplot with the measured PM10 levels on the different hours of the day.
  7. Create a table with summary statistics of the PM10 levels per hour of the day.

  1. one way to calculate the median per category in MS Excel is by the use of a so called array function. See this website for an example.