Generating Select Synthetic Data

Abstract Number:

1560 

Submission Type:

Topic-Contributed Panel Session 

Participants:

Thomas Krenzke (1), Fang Liu (2), Lin Li (1), Hang Kim (3), Aaron Williams (4), Trivellore Raghunathan (5), Saki Kinney (6), Minsun Riddles (1)

Institutions:

(1) Westat, Rockville, MD, (2) University of Notre Dame, South Bend, IN, (3) University of Cincinnati, Cincinnati, OH, (4) Urban Institute, Philadelphia, PA, (5) University of Michigan, Ann Arbor, MI, (6) RTI International, Durham, NC

Chair:

Minsun Riddles  
Westat

Panelist(s):

Fang Liu  
University of Notre Dame
Lin Li  
Westat
Hang Kim  
University of Cincinnati
Aaron Williams  
Urban Institute
Trivellore Raghunathan  
University of Michigan
Saki Kinney  
RTI International

Session Organizer:

Thomas Krenzke  
Westat

Session Description:

Uses of synthetic data have been consistently increasing as the demand for access to microdata and privacy concerns grow. For example, synthetic data are seen as a solution for sharing vast amounts of health data toward developing machine learning models and speeding up research on health data while protecting privacy. Challenges to generating synthetic data are balancing reducing disclosure risk and retaining the integrity of the original data (e.g., maintaining the aggregates, distributions, and associations between variables). To address these challenges, one may synthesize select variables and select records with high disclosure risks, referred to "select" data synthesis approach. This panel session will cover challenges and solutions to generating select synthetic data in various contexts and applications. We believe this session fits the theme of JSM 2024: 'Statistics and Data Science: Informing policy and countering misinformation' well by exploring approaches to expand data access (better-informing policy) while protecting privacy without compromising the integrity of the data (countering misinformation). Paenlists will focus their contributions to the session as follows:
Fang Liu, University of Notre Dame, will address the topic of selective data synthesis with formal privacy guarantees.
Lin Li, Westat, will lead a discussion a comparison of ways to generate select synthetic data in a longitudinal structure.
Hang Kim, University of Cincinnati, will lead a topic with focus on select synthetic microdata for establishment surveys.
Aaron Williams, Urban Institute, will lead discussion on generating select synthesis with library(tidyverse).
Trivellore Raghunathan, University of Michigan, will discuss a generalized swapping approach for privacy protection and valid inferences.
Saki Kinney, RTI, will provide general information, input and insights on select synthetic approaches.

Sponsors:

Government Statistics Section 2
Social Statistics Section 3
Survey Research Methods Section 1

Theme: Statistics and Data Science: Informing Policy and Countering Misinformation

Yes

Applied

Yes

Estimated Audience Size

Small (<80)

I have read and understand that JSM participants must abide by the Participant Guidelines.

Yes

I understand and have communicated to my proposed speakers that JSM participants must register and pay the appropriate registration fee by June 1, 2024. The registration fee is nonrefundable.

I understand