EMS Coreset: An efficient Expectation-Maximization algorithm for Sinkhorn Coreset
Thursday, Aug 6: 8:30 AM - 10:20 AM
1880
Contributed Papers
Thomas M. Menino Convention & Exhibition Center
Coresets distill large datasets into small, representative subsets for efficient downstream learning. Yet Optimal Transport (OT)–based selection typically requires intensive computation of transport plans, limiting scalability. We introduce a scalable Sinkhorn coreset method that permits closed-form updates of the entropically regularized OT coupling by allowing non-uniform coreset weights. This produces centroids that generalize k-means via soft assignments. We establish asymptotic consistency of the selected measure and Lipschitz stability to data perturbations, providing accuracy and robustness guarantees. Across synthetic and real-world benchmarks, the proposed method achieves competitive or improved approximation quality while substantially reducing runtime compared to Wasserstein- and standard Sinkhorn-based coreset selection, especially at large scale.
Coreset
Optimal Transport
Data Distillation
Sinkhorn Loss
EM-algorithm
Main Sponsor
Section on Statistical Learning and Data Science
You have unsaved changes.