Novel Theory in Statistical Inference

Tate Jacobson Chair
Oregon State University
 
Sunday, Aug 3: 2:00 PM - 3:50 PM
4012 
Contributed Papers 
Music City Center 
Room: CC-Davidson Ballroom A2 

Main Sponsor

IMS

Presentations

A Statistical Theory of Contrastive Learning via Approximate Sufficient Statistics

Contrastive learning---a modern approach to extract useful representations from unlabeled data by training models to distinguish similar samples from dissimilar ones---has driven significant progress in foundation models. In this work, we develop a new theoretical framework for analyzing data augmentation-based contrastive learning, with a focus on SimCLR as a representative example. Our approach is based on the concept of \emph{approximate sufficient statistics}, which we extend beyond its original definition in~\cite{oko2025statistical} for contrastive language-image pretraining (CLIP) using KL-divergence. We generalize it to equivalent forms and general $f$-divergences, and show that minimizing SimCLR and other contrastive losses yields encoders that are approximately sufficient. Furthermore, we demonstrate that these near-sufficient encoders can be effectively adapted to downstream regression and classification tasks, with performance depending on their sufficiency and the error induced by data augmentation in contrastive learning. Concrete examples in linear regression and topic classification are provided to illustrate the broad applicability of our results. 

Keywords

Contrastive learning

SimCLR

data augmentation

approximate sufficient statistics 

Co-Author

Song Mei, UC Berkeley

First Author

Licong Lin

Presenting Author

Licong Lin

WITHDRAWN Bridging Root-n and Non-standard Asymptotics: Dimension-agnostic Adaptive Inference in M-Estimation

This manuscript studies a general approach to construct confidence sets for the solution of population-level optimization, commonly referred to as M-estimation. Statistical inference for M-estimation poses significant challenges due to the non-standard limiting behaviors of the corresponding estimator, which arise in settings with increasing dimension of parameters, non-smooth objectives, or constraints. We propose a simple and unified method that guarantees validity in both regular and irregular cases. Moreover, we provide a comprehensive width analysis of the proposed confidence set, showing that the convergence rate of the diameter is adaptive to the unknown degree of instance-specific regularity. We apply the proposed method to several high-dimensional and irregular statistical problems. 

Keywords

Honest inference

Adaptive inference

Irregular M-estimation

Non-standard asymptotics

Extremum estimators 

Co-Author

Arun Kumar Kuchibhotla, Carnegie Mellon University

First Author

Kenta Takatsu, Carnegie Mellon University

Computational and statistical guarantees for star-structured variational inference

We study star-structured variational inference (SVI), an extension of mean-field variational inference that approximates a target distribution $\pi$ over $\mathbb{R}^d$ with a star graphical model $\pi^*$, where a central latent variable is connected to all other variables. We establish the existence, uniqueness, and self-consistency of the star variational solution, derive quantitative approximation error bounds, and provide computational guarantees via projected gradient descent under curvature assumptions on $\pi$. We explore the implications of our results in Gaussian measures and hierarchical Bayesian models, including generalized linear models with location family priors and spike-and-slab priors with one-dimensional debiasing. Our analysis and algorithms rely on functional inequalities and displacement convexity from optimal transport theory. 

Keywords

structured variational inference

log-concavity

Bayesian regression

approximate Bayesian inference

Knothe–Rosenblatt (KR) maps 

Co-Author(s)

Bohan Wu, Columbia University
Binghe Zhu, Columbia University
Aram Pooladian, New York University
Sinho Chewi, Yale University

First Author

Shunan Sheng

Presenting Author

Bohan Wu, Columbia University

Controlling the false discovery rate under non-parametric graphical dependencies

We propose sufficient conditions and procedures for false discovery rate (FDR) control in multiple testing when the p-values are related by a known dependency graph---meaning that we assume mutual independence of p-values not within each other's neighborhoods. Often this dependence is known to be local, implying a sparse graph, but in general the dependence can be partially known. Our main FDR controlling procedure reduces to the Bonferroni correction for fully connected graphs and the usual Benjamini-Hochberg (BH) procedure under independence or PRDS.
Though our main method can be computationally intensive relative to BH, it runs with reasonable wall-clock time even with m = 10^6 hypotheses. Simulations and real data examples establish that its power is typically almost identical to BH. It also typically dominates an alternate approach which reduces to the Benjamini-Yekutieli (BY) correction on fully connected graphs. 

Keywords

false discovery rate

Benjamini-Hochberg

dependence

non-parametric

graphs 

Co-Author

William Fithian, University of California-Berkeley

First Author

Andrew Nguyen, University Of California Berkeley

Presenting Author

Andrew Nguyen, University Of California Berkeley

Efficient Analysis of Latent Spaces in Heterogeneous Networks

This work proposes a unified framework for efficient estimation under latent space modeling of heterogeneous networks. We consider a class of latent space models that decompose latent vectors into shared and network-specific components across networks. We develop a novel procedure that first identifies the shared latent vectors and further refines estimates through efficient score equations to achieve statistical efficiency. Oracle error rates for estimating the shared and heterogeneous latent vectors are established simultaneously. The analysis framework offers remarkable flexibility, accommodating various types of edge weights under exponential family distributions. 

Keywords

Network

Latent space model

Data integration

Low rank

Heterogeneity 

Co-Author(s)

Jiajin Sun
Yinqiu He, University of Wisconsin-Madison

First Author

Yuang Tian, Fudan University

Presenting Author

Jiajin Sun

On the Asymptotics of Exchangeable Sequences of Clusters

Most commonly-deployed models for clustering rely on a prior distribution on the cluster sizes. Many such models in wide use, including the Dirichlet and Pitman-Yor processes, provably exhibit macro-clustering: the largest cluster size grows linearly in the sample size. This property is ill-suited to applications in, for example, social network analysis and genomics, where one might expect cluster sizes to grow slowly or even remain bounded as sample sizes grow. The Exchangeable Sequences of Clusters (ESC) model, developed to account for this exhibits micro-clustering: the size of the largest cluster grows sublinearly in the sample size. Under the ESC model, cluster sizes are independently identically distributed according to a distribution on the positive integers, conditional on their forming an integer partition of the sample size. In contrast to commonly used clustering priors, little is known about the behavior of the ESC model. In this paper, we work to close this gap by establishing the asymptotic growth rates and asymptotic distributions for both the number of clusters and the size of the largest cluster under the ESC model. 

Keywords

clustering

random partitions

micorclustering

Renewal Theory

Regular Variations

Bell polynomials 

Co-Author

Keith Levin, University of Wisconsin

First Author

Nathan Aviles, University of Wisconsin-Madison

Presenting Author

Nathan Aviles, University of Wisconsin-Madison