K–Anonymity–Aware Sequential Sampling Method for Synthetic Data
Abstract Number:
2529
Submission Type:
Contributed Abstract
Contributed Abstract Type:
Speed
Participants:
TaeWook Kim (1), Jeongyoun Ahn (2), Changwon Yoon (3), Cheolwoo Park (2), Bonwoo Lee (1)
Institutions:
(1) N/A, N/A, (2) KAIST, N/A, (3) Department of Industrial & Systems Engineering, KAIST, N/A
Co-Author(s):
Changwon Yoon
Department of Industrial & Systems Engineering, KAIST
Speaker:
Abstract Text:
In sensitive domains like healthcare, synthetic data can replace releasing microdata, aiming to match key statistics while reducing re-identification and attribute-inference risk. Yet generators may still emit rare patterns or near-duplicates. We propose a k–anonymity–aware sequential sampling approach that generates each synthetic record by sampling variables sequentially from histogram-based empirical conditional distributions. At each step, the method restricts candidate values so that the resulting partial pattern (the projection onto the variables already sampled) has at least k matching records in the original dataset. When the conditioning context is too sparse, we relax the conditioning set in a dependence-guided manner, dropping variables weakly related to the variable currently being sampled (e.g., ranked by normalized mutual information) while retaining the minimum-frequency requirement. Overall, k serves as a transparent, user-specified control over the minimum frequency of generated patterns, while dependence-guided relaxation can help preserve useful multivariate structure, supporting a practical balance between fidelity and privacy in synthetic data release.
Keywords:
Synthetic tabular data|Data Privacy|k-anonymity|Sequential sampling|Normalized mutual information|
Sponsors:
Korean International Statistical Society
Tracks:
Miscellaneous
Can this be considered for alternate subtype?
Yes
Are you interested in volunteering to serve as a session chair?
No
I have read and understand that JSM participants must abide by the Participant Guidelines.
Yes
I understand that JSM participants must register and pay the appropriate registration fee by June 1, 2026. The registration fee is non-refundable.
I understand
You have unsaved changes.