## 1.3 Structure of the data

• There are three basic data structures:
1. Cross-sectional data
2. Time-series data
3. Panel data
• Cross-sectional data refers to the multiple observation units during one point in time ($$i=1,2,...,n$$), e.g. income of $$100$$ households in one yer, say $$2004$$

• Time-series refers to a single unit of observation across time ($$t=1,2,...,T$$), e.g. income of a single household for $$5$$ years, say $$2001$$, $$2002$$, $$2003$$, $$2004$$, and $$2005$$

• Panel data include observations for the same set of cross-sectional units $$i$$ at different points in time $$t$$, e.g. income of the same $$100$$ households for $$5$$ years

• Collected data should be always organized in matrix form (observations are presented by rows and variables by columns)

• When collecting the data some issues can emerge:

1. Missing values (abbreviation NA or n/a represents “not applicable”)
2. Measurement errors (collected data may not always present true values)
3. Outliers (extreme values)

Example 1.5 Two tables are given below. Answer the following questions:

1. Which table presents cross-sectionla data, and which one presents time-series data?
2. In which year cross-sectional data are observed?
3. For which audit firm time-series data are observed?
4. Indicate whether quantitative data are continuous or discrete.
TABLE 1.1: Audit firms with respect to the number of employees and net margin (%)
Audit firm Employees Net margin
Deloitte 334800 23.4
PwC 284000 15.6
Erste & Young 398965 20.7
KPMG 227000 13.8
TABLE 1.2: Yearly observations with respect to the number of employees and net margin (%)
Year Employees Net margin
2016 204790 17.3
2017 223600 27.8
2018 287590 16.5
2019 334800 23.4