1.3 Structure of the data
- There are three basic data structures:
- Cross-sectional data
- Time-series data
- Panel data
Cross-sectional data refers to the multiple observation units during one point in time (i=1,2,...,n), e.g. income of 100 households in one yer, say 2004
Time-series refers to a single unit of observation across time (t=1,2,...,T), e.g. income of a single household for 5 years, say 2001, 2002, 2003, 2004, and 2005
Panel data include observations for the same set of cross-sectional units i at different points in time t, e.g. income of the same 100 households for 5 years
Collected data should be always organized in matrix form (observations are presented by rows and variables by columns)
When collecting the data some issues can emerge:
- Missing values (abbreviation NA or n/a represents “not applicable”)
- Measurement errors (collected data may not always present true values)
- Outliers (extreme values)
Example 1.5 Two tables are given below. Answer the following questions:
- Which table presents cross-sectionla data, and which one presents time-series data?
- In which year cross-sectional data are observed?
- For which audit firm time-series data are observed?
- Indicate whether quantitative data are continuous or discrete.
Audit firm | Employees | Net margin |
---|---|---|
Deloitte | 334800 | 23.4 |
PwC | 284000 | 15.6 |
Erste & Young | 398965 | 20.7 |
KPMG | 227000 | 13.8 |
Year | Employees | Net margin |
---|---|---|
2016 | 204790 | 17.3 |
2017 | 223600 | 27.8 |
2018 | 287590 | 16.5 |
2019 | 334800 | 23.4 |