New Methods for Integrative and Adaptive Analysis

Tian Gu Chair
Columbia University
Tuesday, Aug 6: 2:00 PM - 3:50 PM
Contributed Papers 
Oregon Convention Center 
Room: CC-C125 

Main Sponsor



Diagnosing the role of observed distribution shift in scientific replications

Many researchers have identified distribution shift as a likely contributor to the reproducibility crisis in behavioral and biomedical sciences. The idea is that if treatment effects vary across individual characteristics and experimental contexts, then studies conducted in different populations will estimate different average effects. This paper uses ``generalizability" methods to quantify how much of the effect size discrepancy between an original study and its replication can be explained by distribution shift on observed unit-level characteristics. More specifically, we decompose this discrepancy into ``components" attributable to sampling variability (including publication bias), observable distribution shifts, and residual factors. We compute this decomposition for several directly-replicated behavioral science experiments and find little evidence that observable distribution shifts contribute appreciably to non-replicability. In some cases, this is because there is too much statistical noise. In other cases, there is strong evidence that controlling for additional moderators is necessary for reliable replication. 



Distribution shift

Treatment effect


View Abstract 2971


Dominik Rothenhaeusler

First Author

Ying Jin, Stanford University

Presenting Author

Ying Jin, Stanford University

Multi-Source Conformal Inference Under Distribution Shift

Recent years have seen a growing utilization of machine learning models to inform high-stakes decision-making. However, distribution shifts and privacy concerns make it challenging to achieve valid inferences in multi-source environments. We generate distribution-free prediction intervals for a target population, leveraging multiple potentially biased data sources. We derive the efficient influence functions for the quantiles of unobserved outcomes and show that one can incorporate machine learning prediction algorithms in the estimation of nuisance functions while still achieving parametric rates of convergence. Moreover, when conditional outcome invariance is violated, we propose a data-adaptive strategy to weight data sources to balance efficiency gain and bias reduction. We highlight the robustness and efficiency of our proposals for a variety of conformal scores and data-generating mechanisms via extensive synthetic experiments and real data analyses. 


Conformal prediction

Distribution shift

Federated learning

Missing data

Machine learning

Data integration 

View Abstract 2419


Alexander Levis, Carnegie Mellon University
Sharon-Lise Normand, Harvard Medical School
Larry Han, Northeastern University

First Author

Yi Liu, North Carolina State University

Presenting Author

Larry Han, Northeastern University

Non-parametric Adaptive Estimation of Transition Kernels of Controlled Markov Chains

A controlled Markov chain (CMC) is a paired process which constitute a Markovian state and a non-Markovian control. The control is a random variable which chooses a transition kernel and the state transitions according to that transition kernel. The recent popularity of model-based offline reinforcement learning has made learning this transition kernel (a.k.a. "model") an important open question. This talk aims to address that through the lenses of an adaptive, non-parametric, estimator. In particular, we will pose the estimator as a solution to a constrained minimax-optimisation problem and explore its finite sample risk bounds. We will also connect it to recent developments in the theory of model selection. Finally we will discuss some examples which illustrate the applicability of our setup on downstream estimation tasks. 


Markov chain

Controlled Markov Chain

Non-parametric estimation




View Abstract 3474

First Author

Imon Banerjee, Purdue University

Presenting Author

Imon Banerjee, Purdue University

On the risk of a cross-validated shrinkage estimator in the linear model

We study a new shrinkage estimator in the Gaussian model. Unlike the classical James-Stein estimator motivated by the maximization of the marginal likelihood, this estimator is based on the so-called expected log predictive density, a quantity that we estimate with cross validation. We conduct an in-depth risk analysis, and show that its risk is comparable to the one of the celebrated James-Stein estimator. In particular, this estimator outperforms the no-shrinkage baseline if the dimension is greater than 4.
The study of this estimator is motivated by the practice of Bayesian statistics: marginal likelihood maximization for hyperparameter tuning is usually prohibitively expensive even in moderately complex Bayesian models. To deal with this issue practitioners have advocated the use of more tractable surrogates such as the expected log likelihood. Thus, our risk analysis provides theoretical support for this common practice. We apply our shrinkage methodology on an epidemiology application, showing that it can be used to optimally combine information from one small but unbiased sample (a serosurvey) with a large but biased sample (a non-representative survey) 



Expected log predictive density

Cross validation


Gaussian Model 

View Abstract 3282

First Author

gonzalo mena, Carnegie Mellon University

Presenting Author

gonzalo mena, Carnegie Mellon University

Optimal Ridge Regularization for Out-of-Distribution Prediction

We study the behavior of optimal ridge regularization and optimal ridge risk for out-of-distribution prediction, where the test distribution deviates arbitrarily from the train distribution. We establish general conditions that determine the sign of the optimal regularization level under covariate and regression shifts. These conditions capture alignment between the covariance and signal structures in the train and test data and reveal stark differences compared to the in-distribution setting (where the test and train distributions agree); for example, a negative regularization level can be optimal under covariate shift, even when the training features are isotropic. Furthermore, we prove that the optimally-tuned risk is monotonic in the data aspect ratio, even in the out-of-distribution setting. In general, our results do not make any modeling assumptions for the train or the test distributions, except for moment bounds, and allow for arbitrary shifts and the widest possible range of (negative) regularization level. 


Ridge regression

Optimal regularization

Distribution shift

Covariate shift

Regression shift

Risk monotonicity 

View Abstract 3340


Jin-Hong Du, Carnegie Mellon University
Ryan Tibshirani, Carnegie Mellon University

First Author

Pratik Patil, University of California, Berkeley

Presenting Author

Pratik Patil, University of California, Berkeley

Stability and statistical inference for semidiscrete optimal transport maps

We study statistical inference for the optimal transport (OT) map (also known as the Brenier map) from a known absolutely continuous reference distribution onto an unknown finitely discrete target distribution. We derive limit distributions for the $L^p$-error with arbitrary $p \in [1,\infty)$ and for linear functionals of the empirical OT map, together with their moment convergence. The former has a non-Gaussian limit, whose explicit density is derived, while the latter attains asymptotic normality.
For both cases, we also establish consistency of the nonparametric bootstrap. The derivation of our limit theorems relies on new stability estimates of functionals of the OT map with respect to the dual potential vector, which may be of independent interest. We also discuss applications of our limit theorems to the construction of confidence sets for the OT map and inference for a maximum tail correlation. 



functional delta method

Hadamard directional derivative

limit distribution

optimal transport map

semidiscrete optimal transport 

View Abstract 2587


Ziv Goldfeld, Cornell University
Kengo Kato, Cornell University

First Author

Ritwik Sadhu, Cornell University

Presenting Author

Ritwik Sadhu, Cornell University

Transfer Learning for Functional Mean Estimation: Phase Transition and Adaptive Algorithms

This paper studies transfer learning for estimating the mean of random functions based on discretely sampled data, where, in addition to observations from the target distribution, auxiliary samples from similar but distinct source distributions are available. The paper considers both common and independent designs and establishes the minimax rates of convergence for both designs. The results reveal an interesting phase transition phenomenon under the two designs and demonstrate the benefits of utilizing the source samples in the low sampling frequency regime. For practical applications, this paper proposes novel data-driven adaptive algorithms that attain the optimal rates of convergence within a logarithmic factor simultaneously over a large collection of parameter spaces. The theoretical findings are complemented by a simulation study that further supports the effectiveness of the proposed algorithms. 


Transfer learning

Functional data analysis

Mean function

Minimax rate of convergence

Phase transition


View Abstract 3085


Hongming Pu, University of Pennsylvania, Wharton School of Business
Tony Cai, University of Pennsylvania

First Author

Dongwoo Kim, University of Pennsylvania, Wharton School of Business

Presenting Author

Dongwoo Kim, University of Pennsylvania, Wharton School of Business