Conformal inference and statistical testing for reliable deployment of AI/ML models

Junu Lee Chair
University of Pennsylvania
 
Ying Jin Organizer
Stanford University
 
Sunday, Aug 3: 2:00 PM - 3:50 PM
0136 
Invited Paper Session 
Music City Center 
Room: CC-105B 

Keywords

Conformal prediction

Distribution-free inference

Distribution shift

Causal inference 

Applied

Yes

Main Sponsor

IMS

Co Sponsors

International Chinese Statistical Association
Section on Statistical Learning and Data Science

Presentations

Adaptive sample splitting for randomization tests

Randomization tests are widely used to generate finite-sample valid p-values for causal inference on experimental data. However, when applied to subgroup analysis, these tests may lack power due to small subgroup sizes. Incorporating a shared estimator of the conditional average treatment effect (CATE) can substantially improve power across subgroups but requires splitting the treatment assignments between testing and estimation to preserve validity. Motivated by this insight, we introduce AdaSplit, an adaptive sample-splitting procedure that allocates units based on a certainty score for each unit's treatment assignment, computed from its covariates and outcome. The design of AdaSplit is guided by our theoretical analysis, which shows that assignments with high certainty are more effective in increasing test power, while uncertain ones are more valuable for improving CATE estimation when the reserved assignments for randomization tests are imputed from covariates and outcomes. To evaluate the performance of AdaSplit, we conduct simulation studies demonstrating that it yields more powerful randomization tests than baselines that omit CATE estimation or rely on random sample-splitting. Finally, we apply AdaSplit to a blood pressure intervention trial, identifying patient subgroups with significant treatment effects. 

Speaker

Yao Zhang, Stanford Unversity

Spectral Integration of Noisy High-Dimensional Datasets

Joint analysis of heterogeneous high-dimensional data is central to modern applications such as single-cell genomics and medical informatics. We introduce a novel kernel spectral method for jointly embedding independently observed noisy datasets. Our approach captures shared nonlinear manifold structures, handles noise and high dimensionality, and adapts to signal and sample size imbalance. The method is supported by sharp theoretical guarantees under a joint manifold model, including signal recovery consistency and convergence to meaningful limiting operators associated with the manifold. Empirical results on synthetic and real single-cell omics data show clear improvements in embedding, clustering, and denoising over existing methods. 

Co-Author

Rong Ma, Harvard University

Speaker

Xiucai Ding

A conformal test of linear models via permutation-augmented regressions

Permutation tests are widely recognized as robust alternatives to tests based on normal theory. Random permutation tests have been frequently employed to assess the significance of variables in linear models. Despite their widespread use, existing random permutation tests lack finite-sample and assumption-free guarantees for controlling type I error in partial correlation tests. To address this ongoing challenge, we have developed a conformal test through permutation-augmented regressions, which we refer to as PALMRT. PALMRT not only achieves power competitive with conventional methods but also provides reliable control of type I errors at no more than 2α, given any targeted level α, for arbitrary fixed designs and error distributions. We have confirmed this through extensive simulations.
Compared to the cyclic permutation test (CPT) and residual permutation test (RPT), which also offer theoretical guarantees, PALMRT does not compromise as much on power or set stringent requirements on the sample size, making it suitable for diverse biomedical applications. We further illustrate the differences in a long-Covid study where PALMRT validated key findings previously identified using the t-test after multiple corrections, while both CPT and RPT suffered from a drastic loss of power and failed to identify any discoveries. We endorse PALMRT as a robust and practical hypothesis test in scientific research for its superior error control, power preservation, and simplicity.
 

Speaker

Leying Guan, Yale University