3.1 Data structure

  • Sample data which consists of multiple units (\(i=1,~2,~3,...,n\)) observed at a single point in time or one time interval, represents cross-sectional data or spatial data

  • Conversely, if the values of the variables for a single unit are observed over time (\(t=1,~2,~3,...,T\)) these are known as time-series or historical data

  • Data observed for same multiple units across same multiple time periods are known as panel data, which are indexed by two subscripts \(i\) and \(t\), where the total number of observations is \(n\times T\).

TABLE 3.1: Cross-sectional data vs time-seris data
Country Life expec. Poverty rate
Bulgaria 71.4 31.7
Czech 77.2 10.7
Estonia 77.2 22.2
Croatia 76.7 20.9
Italy 82.7 25.2
Latvia 73.1 26.1
Hungary 74.3 19.4
Poland 75.5 16.8
Romania 72.8 34.4
Slovenia 80.7 13.2
Years Life expec. Poverty rate
2013 77.8 25.3
2014 77.9 24.9
2015 77.5 24.4
2016 78.2 23.5
2017 78.1 23.7
2018 78.2 22.1
2019 78.6 20.8
2020 77.8 20.5
2021 76.7 20.9
2022 77.7 19.9

Exercise 10. According to above table in which year the cross-sectional data are observed? For which country the time-series data are observed?

TABLE 3.2: Panel data
Country Year Life expec. poverty rate
Bulgaria 2013 74.9 NA
Bulgaria 2014 74.5 NA
Bulgaria 2015 74.7 43.3
Bulgaria 2016 74.9 41
Bulgaria 2017 74.8 38
Bulgaria 2018 75 33
Bulgaria 2019 75.1 33.2
Bulgaria 2020 73.6 33.5
Bulgaria 2021 71.4 31.7
Bulgaria 2022 74.3 32.2
Czech 2013 78.3 NA
Czech 2014 78.9 NA
Czech 2015 78.7 13
Czech 2016 79.1 12.4
Czech 2017 79.1 12.1
Czech 2018 79.1 11.8
Czech 2019 79.3 12.1
Czech 2020 78.2 11.5
Czech 2021 77.2 10.7
Czech 2022 79.1 11.8
  • Panel data are repeated measurements of the same cross-sectional units

  • If there are missing values, they are referred to as unbalanced panel data