Generating Select Synthetic Data

Minsun Riddles Chair
Westat
 
Fang Liu Panelist
University of Notre Dame
 
Lin Li Panelist
Westat
 
Hang Kim Panelist
University of Cincinnati
 
Aaron Williams Panelist
 
Trivellore Raghunathan Panelist
University of Michigan
 
Saki Kinney Panelist
RTI International
 
Thomas Krenzke Organizer
Westat
 
Tuesday, Aug 6: 10:30 AM - 12:20 PM
1560 
Topic-Contributed Panel Session 
Oregon Convention Center 
Room: CC-G131 
Uses of synthetic data have been consistently increasing as the demand for access to microdata and privacy concerns grow. For example, synthetic data are seen as a solution for sharing vast amounts of health data toward developing machine learning models and speeding up research on health data while protecting privacy. Challenges to generating synthetic data are balancing reducing disclosure risk and retaining the integrity of the original data (e.g., maintaining the aggregates, distributions, and associations between variables). To address these challenges, one may synthesize select variables and select records with high disclosure risks, referred to "select" data synthesis approach. This panel session will cover challenges and solutions to generating select synthetic data in various contexts and applications. We believe this session fits the theme of JSM 2024: 'Statistics and Data Science: Informing policy and countering misinformation' well by exploring approaches to expand data access (better-informing policy) while protecting privacy without compromising the integrity of the data (countering misinformation). Paenlists will focus their contributions to the session as follows:
Fang Liu, University of Notre Dame, will address the topic of selective data synthesis with formal privacy guarantees.
Lin Li, Westat, will lead a discussion a comparison of ways to generate select synthetic data in a longitudinal structure.
Hang Kim, University of Cincinnati, will lead a topic with focus on select synthetic microdata for establishment surveys.
Aaron Williams, Urban Institute, will lead discussion on generating select synthesis with library (tidysynthesis).
Trivellore Raghunathan, University of Michigan, will discuss a generalized swapping approach for privacy protection and valid inferences.
Saki Kinney, RTI, will provide general information, input and insights on select synthetic approaches.

Applied

Yes

Main Sponsor

Survey Research Methods Section

Co Sponsors

Government Statistics Section
Social Statistics Section