Sunday, Aug 3: 2:00 PM - 3:50 PM
4012
Contributed Papers
Music City Center
Room: CC-Davidson Ballroom A2
Main Sponsor
IMS
Presentations
Contrastive learning---a modern approach to extract useful representations from unlabeled data by training models to distinguish similar samples from dissimilar ones---has driven significant progress in foundation models. In this work, we develop a new theoretical framework for analyzing data augmentation-based contrastive learning, with a focus on SimCLR as a representative example. Our approach is based on the concept of \emph{approximate sufficient statistics}, which we extend beyond its original definition in~\cite{oko2025statistical} for contrastive language-image pretraining (CLIP) using KL-divergence. We generalize it to equivalent forms and general $f$-divergences, and show that minimizing SimCLR and other contrastive losses yields encoders that are approximately sufficient. Furthermore, we demonstrate that these near-sufficient encoders can be effectively adapted to downstream regression and classification tasks, with performance depending on their sufficiency and the error induced by data augmentation in contrastive learning. Concrete examples in linear regression and topic classification are provided to illustrate the broad applicability of our results.
Keywords
Contrastive learning
SimCLR
data augmentation
approximate sufficient statistics
This manuscript studies a general approach to construct confidence sets for the solution of population-level optimization, commonly referred to as M-estimation. Statistical inference for M-estimation poses significant challenges due to the non-standard limiting behaviors of the corresponding estimator, which arise in settings with increasing dimension of parameters, non-smooth objectives, or constraints. We propose a simple and unified method that guarantees validity in both regular and irregular cases. Moreover, we provide a comprehensive width analysis of the proposed confidence set, showing that the convergence rate of the diameter is adaptive to the unknown degree of instance-specific regularity. We apply the proposed method to several high-dimensional and irregular statistical problems.
Keywords
Honest inference
Adaptive inference
Irregular M-estimation
Non-standard asymptotics
Extremum estimators
We study star-structured variational inference (SVI), an extension of mean-field variational inference that approximates a target distribution $\pi$ over $\mathbb{R}^d$ with a star graphical model $\pi^*$, where a central latent variable is connected to all other variables. We establish the existence, uniqueness, and self-consistency of the star variational solution, derive quantitative approximation error bounds, and provide computational guarantees via projected gradient descent under curvature assumptions on $\pi$. We explore the implications of our results in Gaussian measures and hierarchical Bayesian models, including generalized linear models with location family priors and spike-and-slab priors with one-dimensional debiasing. Our analysis and algorithms rely on functional inequalities and displacement convexity from optimal transport theory.
Keywords
structured variational inference
log-concavity
Bayesian regression
approximate Bayesian inference
Knothe–Rosenblatt (KR) maps
We propose sufficient conditions and procedures for false discovery rate (FDR) control in multiple testing when the p-values are related by a known dependency graph---meaning that we assume mutual independence of p-values not within each other's neighborhoods. Often this dependence is known to be local, implying a sparse graph, but in general the dependence can be partially known. Our main FDR controlling procedure reduces to the Bonferroni correction for fully connected graphs and the usual Benjamini-Hochberg (BH) procedure under independence or PRDS.
Though our main method can be computationally intensive relative to BH, it runs with reasonable wall-clock time even with m = 10^6 hypotheses. Simulations and real data examples establish that its power is typically almost identical to BH. It also typically dominates an alternate approach which reduces to the Benjamini-Yekutieli (BY) correction on fully connected graphs.
Keywords
false discovery rate
Benjamini-Hochberg
dependence
non-parametric
graphs
This work proposes a unified framework for efficient estimation under latent space modeling of heterogeneous networks. We consider a class of latent space models that decompose latent vectors into shared and network-specific components across networks. We develop a novel procedure that first identifies the shared latent vectors and further refines estimates through efficient score equations to achieve statistical efficiency. Oracle error rates for estimating the shared and heterogeneous latent vectors are established simultaneously. The analysis framework offers remarkable flexibility, accommodating various types of edge weights under exponential family distributions.
Keywords
Network
Latent space model
Data integration
Low rank
Heterogeneity
Most commonly-deployed models for clustering rely on a prior distribution on the cluster sizes. Many such models in wide use, including the Dirichlet and Pitman-Yor processes, provably exhibit macro-clustering: the largest cluster size grows linearly in the sample size. This property is ill-suited to applications in, for example, social network analysis and genomics, where one might expect cluster sizes to grow slowly or even remain bounded as sample sizes grow. The Exchangeable Sequences of Clusters (ESC) model, developed to account for this exhibits micro-clustering: the size of the largest cluster grows sublinearly in the sample size. Under the ESC model, cluster sizes are independently identically distributed according to a distribution on the positive integers, conditional on their forming an integer partition of the sample size. In contrast to commonly used clustering priors, little is known about the behavior of the ESC model. In this paper, we work to close this gap by establishing the asymptotic growth rates and asymptotic distributions for both the number of clusters and the size of the largest cluster under the ESC model.
Keywords
clustering
random partitions
micorclustering
Renewal Theory
Regular Variations
Bell polynomials