GEMSS-Driven Subsampling for Information Extraction and Redundancy Elimination

Ming-Chung Chang Co-Author
 
Ming-Chung Chang Speaker
 
Thursday, Aug 7: 9:50 AM - 10:15 AM
Invited Paper Session 
Music City Center 
Subsampling is an effective approach for addressing the challenges associated with applying statistical methods to large datasets. The training of Gaussian process models, which is notoriously difficult with large-scale data, particularly benefits from subsampling techniques in big data contexts. In this study, we introduce a subsampling methodology designed to enhance the predictive accuracy of Gaussian process models in unexplored input regions. The proposed method, named Generalization Error Minimization in SubSampling (GEMSS), not only identifies informative subsets of data but also removes redundant data points that lead to numerical instability. We establish an equivalence between linear models and Gaussian process models, which facilitates the development of GEMSS. Additionally, we highlight a relevant study by Chang [J. Comput. Graph. Statist. 32 (2023) 613-630] as a specific case within our broader framework. The proposed method is justified by theoretical results and validated through numerical examples across various scenarios.

Keywords

Gaussian process

Generalization error