11.1 Data Types

11.1.1 Qualitative vs. Quantitative Data

A foundational way to categorize data is by whether it is qualitative (non-numerical) or quantitative (numerical). These distinctions often guide research designs, data collection methods, and analytical techniques.

Comparison of Qualitative and Quantitative Research Approaches
Qualitative	Quantitative
Examples: In-depth interviews, focus groups, case studies, ethnographies, open-ended questions, field notes	Examples: Surveys with closed-ended questions, experiments, numerical observations, structured interviews
Nature: Text-based, often descriptive, subjective interpretations	Nature: Numeric, more standardized, objective measures
Analysis: Thematic coding, content analysis, discourse analysis	Analysis: Statistical tests, regression, hypothesis testing, descriptive statistics
Outcome: Rich context, detailed understanding of phenomena	Outcome: Measurable facts, generalizable findings (with appropriate sampling and design)

11.1.1.1 Uses and Advantages of Qualitative Data

Deep Understanding: Captures context, motivations, and perceptions in depth.
Flexibility: Elicits new insights through open-ended inquiry.
Inductive Approaches: Often used to build new theories or conceptual frameworks.

11.1.1.2 Uses and Advantages of Quantitative Data

Measurement and Comparison: Facilitates measuring variables and comparing across groups or over time.
Generalizability: With proper sampling, findings can often be generalized to broader populations.
Hypothesis Testing: Permits the use of statistical methods to test specific predictions or relationships.

11.1.1.3 Limitations of Qualitative and Quantitative Data

Qualitative:
- Findings may be difficult to generalize if samples are small or non-representative.
- Analysis can be time-consuming due to coding and interpreting text.
- Potential for researcher bias in interpretation.
Quantitative:
- May oversimplify complex human behaviors or contextual factors by reducing them to numbers.
- Validity depends heavily on how well constructs are operationalized.
- Can miss underlying meanings or nuances not captured in numeric measures.

11.1.1.4 Levels of Measurement

Even within quantitative data, there are further distinctions based on the level of measurement. This classification is crucial for determining which statistical techniques are appropriate:

Nominal: Categorical data with no inherent order (e.g., gender, blood type, eye color).
Ordinal: Categorical data with a specific order or ranking but without consistent intervals between ranks (e.g., Likert scale responses: “strongly disagree,” “disagree,” “neutral,” “agree,” “strongly agree”).
Interval: Numeric data with equal intervals but no true zero (e.g., temperature in Celsius or Fahrenheit).
Ratio: Numeric data with equal intervals and a meaningful zero (e.g., height, weight, income).

The level of measurement affects which statistical tests (like t-tests, ANOVA, correlations, regressions) are valid and how you can interpret differences or ratios in the data.

11.1.2 Other Ways to Classify Data

Beyond observational structure, there are multiple other dimensions used to classify data:

11.1.2.1 Primary vs. Secondary Data

Primary Data: Collected directly by the researcher for a specific purpose (e.g., firsthand surveys, experiments, direct measurements).
Secondary Data: Originally gathered by someone else for a different purpose (e.g., government census data, administrative records, previously published datasets).

11.1.2.2 Structured, Semi-Structured, and Unstructured Data

Structured Data: Organized in a predefined manner, typically in rows and columns (e.g., spreadsheets, relational databases).
Semi-Structured Data: Contains organizational markers but not strictly tabular (e.g., JSON, XML logs, HTML).
Unstructured Data: Lacks a clear, consistent format (e.g., raw text, images, videos, audio files).
- Often analyzed using natural language processing (NLP), image recognition, or other advanced techniques.

11.1.2.3 Big Data

Characterized by the “3 Vs”: Volume (large amounts), Variety (diverse forms), and Velocity (high-speed generation).
Requires specialized computational tools (e.g., Hadoop, Spark) and often cloud-based infrastructure for storage and processing.
Can be structured or unstructured (e.g., social media feeds, sensor data, clickstream data).

11.1.2.4 Internal vs. External Data (in Organizational Contexts)

Internal Data: Generated within an organization (e.g., sales records, HR data, production metrics).
External Data: Sourced from outside (e.g., macroeconomic indicators, market research reports, social media analytics).

11.1.2.5 Proprietary vs. Public Datas

Proprietary Data: Owned by an organization or entity, not freely available for public use.
Public/Open Data: Freely accessible data provided by governments, NGOs, or other institutions (e.g., data.gov, World Bank Open Data).

11.1.3 Data by Observational Structure Over Time

Another primary way to categorize data is by how observations are collected over time. This classification shapes research design, analytic methods, and the types of inferences we can make. Four major types here are:

Comparison of Data Structures in Empirical Research
Type	Advantages	Limitations
Cross-Sectional Data	Simple, cost-effective, good for studying distributions or correlations at a single time point.	Lacks temporal information, can only infer associations, not causal links.
Time Series Data	Enables trend analysis, seasonality detection, and forecasting.	Requires handling autocorrelation, stationarity issues, and structural breaks.
Repeated Cross-Sectional Data	Tracks shifts in population-level parameters over time; simpler than panel data.	Cannot track individual changes; comparability depends on consistent methodology.
Panel (Longitudinal) Data	Allows causal inference, controls for unobserved heterogeneity, tracks individual trajectories.	Expensive, prone to attrition, requires complex statistical methods.