1.3 Structure of the data

  • There are three basic data structures:
  1. Cross-sectional data
  2. Time-series data
  3. Panel data
  • Cross-sectional data refers to the multiple observation units during one point in time (\(i=1,2,...,n\)), e.g. income of \(100\) households in one yer, say \(2004\)

  • Time-series refers to a single unit of observation across time (\(t=1,2,...,T\)), e.g. income of a single household for \(5\) years, say \(2001\), \(2002\), \(2003\), \(2004\), and \(2005\)

  • Panel data include observations for the same set of cross-sectional units \(i\) at different points in time \(t\), e.g. income of the same \(100\) households for \(5\) years

  • Collected data should be always organized in matrix form (observations are presented by rows and variables by columns)

  • When collecting the data some issues can emerge:

  1. Missing values (abbreviation NA or n/a represents “not applicable”)
  2. Measurement errors (collected data may not always present true values)
  3. Outliers (extreme values)

Example 1.5 Two tables are given below. Answer the following questions:

  1. Which table presents cross-sectionla data, and which one presents time-series data?
  2. In which year cross-sectional data are observed?
  3. For which audit firm time-series data are observed?
  4. Indicate whether quantitative data are continuous or discrete.
TABLE 1.1: Audit firms with respect to the number of employees and net margin (%)
Audit firm Employees Net margin
Deloitte 334800 23.4
PwC 284000 15.6
Erste & Young 398965 20.7
KPMG 227000 13.8
TABLE 1.2: Yearly observations with respect to the number of employees and net margin (%)
Year Employees Net margin
2016 204790 17.3
2017 223600 27.8
2018 287590 16.5
2019 334800 23.4