Generating Synthetic Wellbeing Data: A Case Study on Finnish Athletes
Please login to view abstract download link
Healthcare and wellbeing data are safeguarded by data protection acts such as GDPR and HIPAA. While these are important for the preservation of an individual’s privacy, they hinder research and development of new applications. One solution to this problem is to create synthetic copies of the original data. We generate the synthetic data using two commonly used neural network-based generative methods, Generative Adversarial Networks and Variational Autoencoders. We used Probabilistic Autoregressive models to handle the longitudinal data. We show that the synthetic data have some of the statistical qualities of the original data, but the distributions of individual variables tend to have less variation and are centered around the mean. The results, however, show potential for obtaining sufficient quality measures on synthetic training data sets.