Tuesday, Aug 6: 2:00 PM - 3:50 PM
5116
Contributed Papers
Oregon Convention Center
Room: CC-C125
Main Sponsor
IMS
Presentations
Many researchers have identified distribution shift as a likely contributor to the reproducibility crisis in behavioral and biomedical sciences. The idea is that if treatment effects vary across individual characteristics and experimental contexts, then studies conducted in different populations will estimate different average effects. This paper uses ``generalizability" methods to quantify how much of the effect size discrepancy between an original study and its replication can be explained by distribution shift on observed unit-level characteristics. More specifically, we decompose this discrepancy into ``components" attributable to sampling variability (including publication bias), observable distribution shifts, and residual factors. We compute this decomposition for several directly-replicated behavioral science experiments and find little evidence that observable distribution shifts contribute appreciably to non-replicability. In some cases, this is because there is too much statistical noise. In other cases, there is strong evidence that controlling for additional moderators is necessary for reliable replication.
Keywords
Replicability
Distribution shift
Treatment effect
Generalizability
Recent years have seen a growing utilization of machine learning models to inform high-stakes decision-making. However, distribution shifts and privacy concerns make it challenging to achieve valid inferences in multi-source environments. We generate distribution-free prediction intervals for a target population, leveraging multiple potentially biased data sources. We derive the efficient influence functions for the quantiles of unobserved outcomes and show that one can incorporate machine learning prediction algorithms in the estimation of nuisance functions while still achieving parametric rates of convergence. Moreover, when conditional outcome invariance is violated, we propose a data-adaptive strategy to weight data sources to balance efficiency gain and bias reduction. We highlight the robustness and efficiency of our proposals for a variety of conformal scores and data-generating mechanisms via extensive synthetic experiments and real data analyses.
Keywords
Conformal prediction
Distribution shift
Federated learning
Missing data
Machine learning
Data integration
A controlled Markov chain (CMC) is a paired process which constitute a Markovian state and a non-Markovian control. The control is a random variable which chooses a transition kernel and the state transitions according to that transition kernel. The recent popularity of model-based offline reinforcement learning has made learning this transition kernel (a.k.a. "model") an important open question. This talk aims to address that through the lenses of an adaptive, non-parametric, estimator. In particular, we will pose the estimator as a solution to a constrained minimax-optimisation problem and explore its finite sample risk bounds. We will also connect it to recent developments in the theory of model selection. Finally we will discuss some examples which illustrate the applicability of our setup on downstream estimation tasks.
Keywords
Markov chain
Controlled Markov Chain
Non-parametric estimation
Adaptive-estimation
besov-classes
optimisation
We study a new shrinkage estimator in the Gaussian model. Unlike the classical James-Stein estimator motivated by the maximization of the marginal likelihood, this estimator is based on the so-called expected log predictive density, a quantity that we estimate with cross validation. We conduct an in-depth risk analysis, and show that its risk is comparable to the one of the celebrated James-Stein estimator. In particular, this estimator outperforms the no-shrinkage baseline if the dimension is greater than 4.
The study of this estimator is motivated by the practice of Bayesian statistics: marginal likelihood maximization for hyperparameter tuning is usually prohibitively expensive even in moderately complex Bayesian models. To deal with this issue practitioners have advocated the use of more tractable surrogates such as the expected log likelihood. Thus, our risk analysis provides theoretical support for this common practice. We apply our shrinkage methodology on an epidemiology application, showing that it can be used to optimally combine information from one small but unbiased sample (a serosurvey) with a large but biased sample (a non-representative survey)
Keywords
Shrinkage
Expected log predictive density
Cross validation
James-Stein
Gaussian Model
We study the behavior of optimal ridge regularization and optimal ridge risk for out-of-distribution prediction, where the test distribution deviates arbitrarily from the train distribution. We establish general conditions that determine the sign of the optimal regularization level under covariate and regression shifts. These conditions capture alignment between the covariance and signal structures in the train and test data and reveal stark differences compared to the in-distribution setting (where the test and train distributions agree); for example, a negative regularization level can be optimal under covariate shift, even when the training features are isotropic. Furthermore, we prove that the optimally-tuned risk is monotonic in the data aspect ratio, even in the out-of-distribution setting. In general, our results do not make any modeling assumptions for the train or the test distributions, except for moment bounds, and allow for arbitrary shifts and the widest possible range of (negative) regularization level.
Keywords
Ridge regression
Optimal regularization
Distribution shift
Covariate shift
Regression shift
Risk monotonicity
We study statistical inference for the optimal transport (OT) map (also known as the Brenier map) from a known absolutely continuous reference distribution onto an unknown finitely discrete target distribution. We derive limit distributions for the $L^p$-error with arbitrary $p \in [1,\infty)$ and for linear functionals of the empirical OT map, together with their moment convergence. The former has a non-Gaussian limit, whose explicit density is derived, while the latter attains asymptotic normality.
For both cases, we also establish consistency of the nonparametric bootstrap. The derivation of our limit theorems relies on new stability estimates of functionals of the OT map with respect to the dual potential vector, which may be of independent interest. We also discuss applications of our limit theorems to the construction of confidence sets for the OT map and inference for a maximum tail correlation.
Keywords
Bootstrap
functional delta method
Hadamard directional derivative
limit distribution
optimal transport map
semidiscrete optimal transport
This paper studies transfer learning for estimating the mean of random functions based on discretely sampled data, where, in addition to observations from the target distribution, auxiliary samples from similar but distinct source distributions are available. The paper considers both common and independent designs and establishes the minimax rates of convergence for both designs. The results reveal an interesting phase transition phenomenon under the two designs and demonstrate the benefits of utilizing the source samples in the low sampling frequency regime. For practical applications, this paper proposes novel data-driven adaptive algorithms that attain the optimal rates of convergence within a logarithmic factor simultaneously over a large collection of parameter spaces. The theoretical findings are complemented by a simulation study that further supports the effectiveness of the proposed algorithms.
Keywords
Transfer learning
Functional data analysis
Mean function
Minimax rate of convergence
Phase transition
Adaptivity
Co-Author(s)
Hongming Pu, University of Pennsylvania, Wharton School of Business
Tony Cai, University of Pennsylvania
First Author
Dongwoo Kim, University of Pennsylvania, Wharton School of Business
Presenting Author
Dongwoo Kim, University of Pennsylvania, Wharton School of Business