K–Anonymity–Aware Sequential Sampling Method for Synthetic Data

Abstract Number:

2529 

Submission Type:

Contributed Abstract 

Contributed Abstract Type:

Speed 

Participants:

TaeWook Kim (1), Jeongyoun Ahn (2), Changwon Yoon (3), Cheolwoo Park (2), Bonwoo Lee (1)

Institutions:

(1) N/A, N/A, (2) KAIST, N/A, (3) Department of Industrial & Systems Engineering, KAIST, N/A

Co-Author(s):

Jeongyoun Ahn  
KAIST
Changwon Yoon  
Department of Industrial & Systems Engineering, KAIST
Cheolwoo Park  
KAIST
Bonwoo Lee  
N/A

Speaker:

TaeWook Kim  
N/A

Abstract Text:

In sensitive domains like healthcare, synthetic data can replace releasing microdata, aiming to match key statistics while reducing re-identification and attribute-inference risk. Yet generators may still emit rare patterns or near-duplicates. We propose a k–anonymity–aware sequential sampling approach that generates each synthetic record by sampling variables sequentially from histogram-based empirical conditional distributions. At each step, the method restricts candidate values so that the resulting partial pattern (the projection onto the variables already sampled) has at least k matching records in the original dataset. When the conditioning context is too sparse, we relax the conditioning set in a dependence-guided manner, dropping variables weakly related to the variable currently being sampled (e.g., ranked by normalized mutual information) while retaining the minimum-frequency requirement. Overall, k serves as a transparent, user-specified control over the minimum frequency of generated patterns, while dependence-guided relaxation can help preserve useful multivariate structure, supporting a practical balance between fidelity and privacy in synthetic data release.

Keywords:

Synthetic tabular data|Data Privacy|k-anonymity|Sequential sampling|Normalized mutual information|

Sponsors:

Korean International Statistical Society

Tracks:

Miscellaneous

Can this be considered for alternate subtype?

Yes

Are you interested in volunteering to serve as a session chair?

No

I have read and understand that JSM participants must abide by the Participant Guidelines.

Yes

I understand that JSM participants must register and pay the appropriate registration fee by June 1, 2026. The registration fee is non-refundable.

I understand