1.3 Structure of the data
- There are three basic data structures:
- Cross-sectional data
- Time-series data
- Panel data
Cross-sectional data refers to the multiple observation units during one point in time (\(i=1,2,...,n\)), e.g. income of \(100\) households in one yer, say \(2004\)
Time-series refers to a single unit of observation across time (\(t=1,2,...,T\)), e.g. income of a single household for \(5\) years, say \(2001\), \(2002\), \(2003\), \(2004\), and \(2005\)
Panel data include observations for the same set of cross-sectional units \(i\) at different points in time \(t\), e.g. income of the same \(100\) households for \(5\) years
Collected data should be always organized in matrix form (observations are presented by rows and variables by columns)
When collecting the data some issues can emerge:
- Missing values (abbreviation NA or n/a represents “not applicable”)
- Measurement errors (collected data may not always present true values)
- Outliers (extreme values)
Example 1.5 Two tables are given below. Answer the following questions:
- Which table presents cross-sectionla data, and which one presents time-series data?
- In which year cross-sectional data are observed?
- For which audit firm time-series data are observed?
- Indicate whether quantitative data are continuous or discrete.
Audit firm | Employees | Net margin |
---|---|---|
Deloitte | 334800 | 23.4 |
PwC | 284000 | 15.6 |
Erste & Young | 398965 | 20.7 |
KPMG | 227000 | 13.8 |
Year | Employees | Net margin |
---|---|---|
2016 | 204790 | 17.3 |
2017 | 223600 | 27.8 |
2018 | 287590 | 16.5 |
2019 | 334800 | 23.4 |