61: Utilizing Variational Autoencoders to Shift Individual Level Data Towards Summary Level Statistics
Monday, Aug 4: 2:00 PM - 3:50 PM
1669
Contributed Posters
Music City Center
Access to individual level data (ILD) from published literature poses a hurdle for researchers. However, access is a driving force for many analyses (surrogate outcome validation, subgroup analyses, and other settings). Generative modeling can produce synthetic data that reflects the underlying properties of existing ILD. Specifically, while utilizing Variational Autoencoders (VAEs) and extending to tabular data, new possibilities for accelerating research arise. This application of VAEs, within R, presents a simple method for researchers to leverage a set of ILD. This method applies to a mixture of distributions (binary, categorical, normal, etc.). While access to ILD may be difficult, summary level information is more readily available. We propose an extension of VAEs to shift the underlying distribution of the data towards summary level statistics. This extension produces multiple sets of ILD under different prior information. The resulting, shifted, ILD can be considered a trustworthy representation of a published paper's data. By extending the framework of VAEs to tabular data and allowing for a distribution shift, exploratory research without direct ILD access is plausible.
Variational Autoencoders
Synthetic Data
Distribution Shift
Machine Learning
Generative Modeling
Summary Level Data
Main Sponsor
Section on Statistical Learning and Data Science
You have unsaved changes.