11.2 Cross-Sectional Data
Cross-sectional data consists of observations on multiple entities (e.g., individuals, firms, regions, or countries) at a single point in time or over a very short period, where time is not a primary dimension of variation.
- Each observation represents a different entity, rather than the same entity tracked over time.
- Unlike time series data, the order of observations does not carry temporal meaning.
Examples
- Labor Economics: Wage and demographic data for 1,000 workers in 2024.
- Marketing Analytics: Customer satisfaction ratings and purchasing behavior for 500 online shoppers surveyed in Q1 of a year.
- Corporate Finance: Financial statements of 1,000 firms for the fiscal year 2023.
Key Characteristics
- Observations are independent (in an ideal setting): Each unit is drawn from a population with no intrinsic dependence on others.
- No natural ordering: Unlike time series data, the sequence of observations does not affect analysis.
- Variation occurs across entities, not over time: Differences in observed outcomes arise from differences between individuals, firms, or regions.
Advantages
- Straightforward Interpretation: Since time effects are not present, the focus remains on relationships between variables at a single point.
- Easier to Collect and Analyze: Compared to time series or panel data, cross-sectional data is often simpler to collect and model.
- Suitable for causal inference (if exogeneity conditions hold).
Challenges
- Omitted Variable Bias: Unobserved confounders may drive both the dependent and independent variables.
- Endogeneity: Reverse causality or measurement error can introduce bias.
- Heteroskedasticity: Variance of errors may differ across entities, requiring robust standard errors.
A typical cross-sectional regression model:
\[ y_i = \beta_0 + x_{i1}\beta_1 + x_{i2}\beta_2 + \dots + x_{i(k-1)}\beta_{k-1} + \epsilon_i \]
where:
- \(y_i\) is the outcome variable for entity \(i\),
- \(x_{ij}\) are explanatory variables,
- \(\epsilon_i\) is an error term capturing unobserved factors.