11.2 Cross-Sectional Data

Cross-sectional data consists of observations on multiple entities (e.g., individuals, firms, regions, or countries) at a single point in time or over a very short period, where time is not a primary dimension of variation.

Each observation represents a different entity, rather than the same entity tracked over time.
Unlike time series data, the order of observations does not carry temporal meaning.

Examples

Labor Economics: Wage and demographic data for 1,000 workers in 2024.
Marketing Analytics: Customer satisfaction ratings and purchasing behavior for 500 online shoppers surveyed in Q1 of a year.
Corporate Finance: Financial statements of 1,000 firms for the fiscal year 2023.

Key Characteristics

Observations are independent (in an ideal setting): Each unit is drawn from a population with no intrinsic dependence on others.
No natural ordering: Unlike time series data, the sequence of observations does not affect analysis.
Variation occurs across entities, not over time: Differences in observed outcomes arise from differences between individuals, firms, or regions.

Advantages

Straightforward Interpretation: Since time effects are not present, the focus remains on relationships between variables at a single point.
Easier to Collect and Analyze: Compared to time series or panel data, cross-sectional data is often simpler to collect and model.
Suitable for causal inference (if exogeneity conditions hold).

Challenges

Omitted Variable Bias: Unobserved confounders may drive both the dependent and independent variables.
Endogeneity: Reverse causality or measurement error can introduce bias.
Heteroskedasticity: Variance of errors may differ across entities, requiring robust standard errors.

A typical cross-sectional regression model:

$y_i = \beta_0 + x_{i1}\beta_1 + x_{i2}\beta_2 + \dots + x_{i(k-1)}\beta_{k-1} + \epsilon_i$

where:

$y_i$ is the outcome variable for entity $i$ ,
$x_{ij}$ are explanatory variables,
$\epsilon_i$ is an error term capturing unobserved factors.