61: Utilizing Variational Autoencoders to Shift Individual Level Data Towards Summary Level Statistics

Janice Weinberg Co-Author
Boston Univ School of Public Health
 
Sarah Milligan First Author
 
Sarah Milligan Presenting Author
 
Monday, Aug 4: 2:00 PM - 3:50 PM
1669 
Contributed Posters 
Music City Center 
Access to individual level data (ILD) from published literature poses a hurdle for researchers. However, access is a driving force for many analyses (surrogate outcome validation, subgroup analyses, and other settings). Generative modeling can produce synthetic data that reflects the underlying properties of existing ILD. Specifically, while utilizing Variational Autoencoders (VAEs) and extending to tabular data, new possibilities for accelerating research arise. This application of VAEs, within R, presents a simple method for researchers to leverage a set of ILD. This method applies to a mixture of distributions (binary, categorical, normal, etc.). While access to ILD may be difficult, summary level information is more readily available. We propose an extension of VAEs to shift the underlying distribution of the data towards summary level statistics. This extension produces multiple sets of ILD under different prior information. The resulting, shifted, ILD can be considered a trustworthy representation of a published paper's data. By extending the framework of VAEs to tabular data and allowing for a distribution shift, exploratory research without direct ILD access is plausible.

Keywords

Variational Autoencoders



Synthetic Data


Distribution Shift

Machine Learning

Generative Modeling

Summary Level Data 

Abstracts


Main Sponsor

Section on Statistical Learning and Data Science