39.2 Synthetic Data: An Overview

Synthetic data, modeling real data while ensuring anonymity, is becoming pivotal in research. While promising, it has its own complexities and should be approached with caution.

39.2.1 Benefits

  • Privacy preservation.
  • Data fairness and augmentation.
  • Acceleration in research.

39.2.2 Concerns

  • Misconceptions about inherent privacy.
  • Challenges with data outliers.
  • Models relying solely on synthetic data can pose risks.

39.2.3 Further Insights on Synthetic Data

Synthetic data bridges the model-centric and data-centric perspectives, making it an essential tool in modern research. Analogously, it’s like viewing the Mona Lisa’s replica, with the real painting stored securely.

Future projects, such as utilizing the R’s diamonds dataset for synthetic data generation, hold promise in demonstrating the vast potentials of this technology.

For a deeper dive into synthetic data and its applications, refer to (Jordon et al. 2022).


Jordon, James, Lukasz Szpruch, Florimond Houssiau, Mirko Bottarelli, Giovanni Cherubin, Carsten Maple, Samuel N Cohen, and Adrian Weller. 2022. “Synthetic Data–What, Why and How?” arXiv Preprint arXiv:2205.03257.