Thursday, Aug 7: 8:30 AM - 10:20 AM
4208
Contributed Papers
Music City Center
Room: CC-Davidson Ballroom A2
Main Sponsor
Section on Statistical Learning and Data Science
Presentations
Foundation models are increasingly deployed across various domains, offering valuable insights and making decisions. However, ensuring their outputs align with human interpretations is critical before deployment, particularly in high-stakes applications. This highlights the need for a rigorous uncertainty quantification (UQ) method to assess alignment reliability. Most existing methods rely on large labeled training datasets, limiting their applicability in real-world settings where labeled data is scarce or expensive. This paper introduces Conformal Mirror Statistics (CMS), a novel framework for UQ in model alignment. Unlike conventional conformal methods based on p-value calibration, CMS generalizes to broader settings without the restriction of sample size regarding test and calibration sets, while tightly controlling FDR. Empirical evaluations on two large sepsis cohorts from MIMIC-III and IV demonstrate that CMS is able to reliably select candidates with certain outputs while outperforming conventional methods in FDR control.
Keywords
Conformal Inference
False Discovery Rate Control
Model Alignment
Uncertainty Quantification
Latent variable models are popularly used to measure latent factors from large-scale assessment data. Beyond understanding latent factors, the covariate effect on responses controlling for latent factors is also of great scientific interest and has wide applications, such as evaluating testing fairness, where the covariate effect reflects whether a test question is biased toward certain individual characteristics (e.g., gender), controlling for their latent abilities. However, the large sample sizes and high dimensional responses pose challenges to developing efficient methods and drawing valid inferences. Moreover, to accommodate the discrete responses, nonlinear factor models are often assumed, adding further complexity. To address these challenges, we consider a covariate-adjusted generalized factor model and develop novel and interpretable conditions to address the identifiability issue. Based on the identifiability conditions, we propose a joint maximum likelihood estimation method and establish estimation consistency and asymptotic normality results for the covariate effects. Furthermore, we derive estimation and inference results for latent factors and factor loadings.
Keywords
Generalized factor model
Covariate adjustment
Large-scale testing
Inference for online algorithms is a difficult problem because estimation of asymptotic variance can inflate the computational cost. Previous works have proposed online estimation of the covariance matrix as well as batching methods to construct confidence intervals. In this work, we propose the use of the recently developed HulC procedure for uncertainty quantification in the online setting. The highlights of this procedure include: no inflation in the computational cost; no estimation of the asymptotic variance; and asymptotically exact coverage.
We compare the performance of this procedure with those of previous works in the context of linear and logistic regression over a wide range of covariance settings and dimension-aspect ratios. Our main finding is that we get comparable or better coverage properties compared to the methods that estimate the asymptotic variance.
Keywords
Stochastic gradient descent
asymptotic variance
high-dimensional inference
HulC
distributed learning
martingales
Tuberculosis (TB) studies often involve four different states under consideration, namely: "healthy", "latent infection", "pulmonary active disease", and "extra-pulmonary active disease". While highly accurate clinical diagnosis tests do exist, they are expensive and generally inaccessible in regions where they are most needed; thus, there is an interest in assessing the accuracy of new and easily obtainable biomarkers. For some such biomarkers, the typical stochastic ordering assumption might not be justified for all disease classes under study, and usual ROC methodologies that involve ROC surfaces and hypersurfaces are inadequate. Different types of orderings may be appropriate depending on the setting, and these may involve a number of ambiguously ordered groups that stochastically exhibit larger (or lower) marker scores than the remaining groups. Recently, there has been scientific interest on ROC methods that can accommodate these so-called 'tree' or 'umbrella' orderings. However, there is limited work discussing the estimation of cutoffs in such settings. In this paper, we discuss the estimation and inference around optimized cutoffs when accounting for such configurations.
Keywords
biomarker
TROC
ROC surface
box-cox
kernels
cutoff
Regression-based inference is widely employed to analyze experimental data. We propose a non-asymptotic approach for estimating causal effects under homogeneous and heterogeneous linear models, utilizing Rao's score test within the maximum likelihood framework. Specifically, we address two heterogeneous settings: one assumes constant variance, while the other allows variance to depend on covariates. Unlike traditional asymptotic methods, which require large sample sizes with fixed parameter dimensions, our method derives explicit bounds that depend on covariate dimensions and variance assumptions, offering scalability and adaptability to diverse model settings. Under bounded variance and sub-Gaussian assumptions, we extend this framework to a quasi-likelihood setting for causal inference, applying it to hypothesis testing with Rao's score test and providing a robust tool for causal analysis and hypothesis testing across various applications.
Keywords
Regression-based inference
non-asymptotic analysis
likelihood framework
quasi-likelihood framework
Conformal inference has played a pivotal role in providing uncertainty quantification for black-box ML prediction algorithms with finite sample guarantees. Traditionally, conformal prediction inference requires a data-independent specification of miscoverage level. In practical applications, one might want to update the miscoverage level after computing the prediction set. The construction of prediction sets that guarantee coverage with data-dependent miscoverage level can be considered as a post-selection inference problem. In this work, we develop simultaneous conformal inference to account for data-dependent miscoverage levels. Under the assumption of independent and identically distributed observations, our proposed methods have a finite sample simultaneous guarantee over all miscoverage levels. Furthermore, we also propose methods that have the same guarantees for a user-specified choice of miscoverage levels. This allows practitioners to trade freely coverage probability for the quality of the prediction set by any criterion of their choice (say size of prediction set) while maintaining the finite sample guarantees similar to traditional conformal inference.
Keywords
post-selection inference
conformal prediction
distribution-free
CDF confidence bands
black-box methods
Mendelian randomization employs protein quantitative trait loci (pQTLs) as instruments to address unobserved confounding in protein biomarker discovery for complex diseases like Alzheimer's and autism spectrum disorder. However, the presence of invalid pQTL instruments - those violate core instrument assumptions - can threaten inference validity by introducing bias. While existing methods aim to detect invalid instruments, their susceptibility to selection errors risks propagating bias into causal estimates. To address this, we propose a novel resampling-based approach that accounts for the selection uncertainty, ensuring robustness against misclassified instruments. By incorporating a data-driven prior on pQTL validity, our approach enhances efficiency while maintaining robustness. We showed that our method is free of selection errors across diverse pleiotropic scenarios and improves the CI efficiency by approximately 20% compared to the previous robust MR methods. We applied our method to genome-wide proteomics data from 54,306 UK Biobank individuals and a genome-wide association study of Alzheimer's disease with 455,258 subjects, and identified five causal protein biomarkers.
Keywords
Pose-selection inference
Mendelian Randomization
Instrumental variable
Protein biomarkers