11.2 Cross-Sectional Data
Cross-sectional data consists of observations on multiple entities (e.g., individuals, firms, regions, or countries) at a single point in time or over a very short period, where time is not a primary dimension of variation.
- Each observation represents a different entity, rather than the same entity tracked over time.
- Unlike time series data, the order of observations does not carry temporal meaning.
Examples
- Labor Economics: Wage and demographic data for 1,000 workers in 2024.
- Marketing Analytics: Customer satisfaction ratings and purchasing behavior for 500 online shoppers surveyed in Q1 of a year.
- Corporate Finance: Financial statements of 1,000 firms for the fiscal year 2023.
Key Characteristics
- Observations are independent (in an ideal setting): Each unit is drawn from a population with no intrinsic dependence on others.
- No natural ordering: Unlike time series data, the sequence of observations does not affect analysis.
- Variation occurs across entities, not over time: Differences in observed outcomes arise from differences between individuals, firms, or regions.
Advantages
- Straightforward Interpretation: Since time effects are not present, the focus remains on relationships between variables at a single point.
- Easier to Collect and Analyze: Compared to time series or panel data, cross-sectional data is often simpler to collect and model.
- Suitable for causal inference (if exogeneity conditions hold).
Challenges
- Omitted Variable Bias: Unobserved confounders may drive both the dependent and independent variables.
- Endogeneity: Reverse causality or measurement error can introduce bias.
- Heteroskedasticity: Variance of errors may differ across entities, requiring robust standard errors.
A typical cross-sectional regression model:
yi=β0+xi1β1+xi2β2+⋯+xi(k−1)βk−1+ϵi
where:
- yi is the outcome variable for entity i,
- xij are explanatory variables,
- ϵi is an error term capturing unobserved factors.