Evaluating the Disclosure Risk and Analytic Utility of Synthetic Data in a Municipal Health Survey

Abstract Number:

3545 

Submission Type:

Contributed Abstract 

Contributed Abstract Type:

Paper 

Participants:

Stephen Immerwahr (1), Wen Qin Deng (1), Jingchen Hu (2), Tashema Bholanath (1), Fangtao He (1), Nneka Lundy De La Cruz (1)

Institutions:

(1) NYC Department of Health and Mental Hygiene, Long Island City, NY, (2) Vassar College, N/A

Co-Author(s):

Wen Qin Deng  
NYC Department of Health and Mental Hygiene
Jingchen Hu  
Vassar College
Tashema Bholanath  
NYC Department of Health and Mental Hygiene
Fangtao He  
NYC Department of Health and Mental Hygiene
Nneka Lundy De La Cruz  
NYC Department of Health and Mental Hygiene

First Author:

Stephen Immerwahr  
NYC Department of Health and Mental Hygiene

Presenting Author:

Stephen Immerwahr  
NYC Department of Health and Mental Hygiene

Abstract Text:

Releasing public-use micro-level data files from health surveys holds immense value for science and health policy. However, even after removing personally identifying information, the privacy of survey respondents may still be compromised. Using a large NYC population-representative health survey (n=10,271), we identified high-risk observations based on population estimates through a combination of key variables. We compared three different solutions to mitigate the risk of re-identification – suppression, synthesis using Classification and Regression Trees, and synthesis via Bayesian models – and assess their impact on both risk and loss of utility of the resulting protected data. While both synthesis methods resulted in slightly higher disclosure risks compared to the suppression method, the synthetic datasets preserved a higher level of utility. We will discuss our proposed solutions to avoid over-protecting and potentially obscuring estimates for underserved and vulnerable groups and share our experiences with data curators in advancing disclosure risk controls and data sharing in public health.

Keywords:

Health Surveys|Data Privacy Risk|Synthetic Data|Survey Research Methods|Government Statistics|

Sponsors:

Survey Research Methods Section

Tracks:

Privacy and Confidentiality Methods

Can this be considered for alternate subtype?

Yes

Are you interested in volunteering to serve as a session chair?

No

I have read and understand that JSM participants must abide by the Participant Guidelines.

Yes

I understand that JSM participants must register and pay the appropriate registration fee by June 1, 2024. The registration fee is non-refundable.

I understand