Advances in Adaptive Experimentation and Anytime-Valid Inference

Shubhanshu Shekhar Chair
University of Michigan, Ann Arbor
 
Shubhanshu Shekhar Organizer
University of Michigan, Ann Arbor
 
Sunday, Aug 3: 2:00 PM - 3:50 PM
0812 
Topic-Contributed Paper Session 
Music City Center 
Room: CC-104D 

Applied

Yes

Main Sponsor

IMS

Co Sponsors

International Indian Statistical Association
Section on Statistical Learning and Data Science

Presentations

Minimax Rate Optimal Algorithms For High-Dimensional Stochastic Linear Bandits

We consider the stochastic linear bandit problem with several arms over T rounds, where the dimension d of the covariate vectors is potentially larger than T, although each arm-specific parameter vector has at most s non-zero components. First, we study the single-arm case, focusing on the cumulative estimation error of the parameter vector. We show that Lasso has a sub-optimal dependence on d and T in terms of worst-case error in the online estimation setup. Further, we establish the minimax optimality of the thresholded Lasso, which first estimates the support of the parameter vector by thresholding the Lasso estimator and then runs least squares on the selected support. Second, we consider the bandit setup, using the thresholded Lasso as the main estimation method. We propose a three-stage algorithm for arm selection and derive an upper bound of order s(log(d)+log(T))log(s) on its cumulative regret. Finally, we establish a matching lower bound up to a log(s) multiplicative term, which shows the near minimax optimality of the proposed method. We note that our analysis avoids the restrictive "beta-min" condition. 

Keywords

Stochastic Linear Bandits

High-dimensional Linear Models

Minimax Rate Optimality 

Speaker

Yanglei Song, Queen's University

Confidence sequences via online learning with applications in offline contextual bandits and active preference learning

Confidence sequence provides ways to characterize uncertainty in stochastic environments, which is a widely-used tool for interactive machine learning algorithms and statistical problems including A/B testing, Bayesian optimization, reinforcement learning, and offline evaluation/learning. In these problems, constructing confidence sequences that are tight without losing correctness is crucial since it has a dramatic impact on the performance of downstream tasks. In this talk, I will present how to leverage results from online learning to derive confidence sequences that are provably and numerically tight. First, I will present an implicitly-defined confidence sequence for bounded random variables, which induces an empirical Bernstein-style confidence bound (thus adapts to the variance) and is provably better than the KL divergence-based confidence bound simultaneously, unlike the standard empirical Bernstein bound. Our confidence bound is never vacuous, can be efficiently computed, and provides state-of-the-art numerical performance. Furthermore, I will show that our bound can also be extended to [0,\infty)-valued random variables and obtain a lower confidence bound. We apply this to offline contextual bandits and obtain the state-of-the-art learning guarantee. Second, I will turn to generalized linear models and how leveraging online learning helps develop (i) improved confidence sets for logistic linear models and (ii) noise-adaptive confidence sets for linear models with sub-Gaussian and bounded noise respectively. These lead to improved regret bounds in (generalized) linear bandit problems. Furthermore, I will show our confidence bound can be used to improve active preference learning in large language models. I will conclude with open problems in confidence sequences and the role that online learning plays for them. 

Speaker

Kwang-Sung Jun, University of Arizona

Non-integrable test processes

I will discuss my paper "the extended Ville's inequality for nonintegrable nonnegative supermartingales" (to appear in Bernoulli) which deals with using non-integrable test processes to sequentially test hypotheses and construct confidence sequences. This arises from incorporating the Bayesian idea of improper priors into the anytime-valid testing framework. 

Speaker

Hongjian Wang, Carnegie Mellon University

Sequential Change Detection with Simulators

We consider the problem of sequential change detection in the presence of simulators of the
pre- and post-change distributions. Formally, given a stream of observations $(U_t, V_t, X_t)_{t
\geq 1}$, with $(U_t)_{t \geq 1}$ and $(V_t)_{t \geq 1}$ generated by simulators, and $(X_t)_{t
\geq 1}$ denoting the real sequence, we develop a general strategy for quickly detecting
changes in the distribution of $X_t$ from an (unknown) $P_0$ to another (unknown) distribution
$P_1 \neq P_0$ at some unknown time $T$. We have additional information that $U_t \sim
Q_0$ and $V_t \sim Q_1$, with $Q_0 \approx P_0$ and $Q_1 \approx P_1$.
We show that this problem can be reduced to that of detecting changes in the sign of the
mean of the stream of observations. This allows us to take advantage of recent progress in the
design of methods for estimating and testing for the mean of random variables in the sequential
setting to construct powerful change detection schemes. We discuss several instantiations of
our general change detection scheme and end the presentation with some numerical
experiments. 

Speaker

Shubhanshu Shekhar, University of Michigan, Ann Arbor

Design-Based Confidence Sequences: A General Approach to Risk Mitigation in Online Experimentation with Netflix

Randomized experiments have become the standard method for companies to evaluate the performance of new products or services. Beyond aiding managerial decision-making, experiments mitigate risk by limiting the proportion of customers exposed to innovations. Since many experiments are conducted sequentially over time, an emerging strategy to further derisk the process is to allow managers to "peek" at the results as new data becomes available and stop the test if the results are statistically significant. The class of statistical methods that allow managers to peek and still provide valid inference are often called anytime-valid since they maintain proper uniform type-1 error guarantees. In this paper, we extend existing anytime-valid approaches to accommodate the more complex yet standard settings in time series, switchback, and panel experiments. To achieve this, we leverage the design-based approach to focus on assumption-light and managerial relevant finite-sample estimands defined on the study partic- ipants as a direct measure of the risks incurred by companies. As a special case, our results also provide a robust method for achieving always-valid inference in A/B tests. We further provide a variance reduction technique incorporating modeling assumptions and covariates. Finally, we demonstrate the effectiveness of our proposed approach through a simulation study and three real-world applications from Netflix. Our results show that by using our confidence sequence, harmful experiments could be stopped after only observing a handful of units; for instance, an experiment that Netflix ran on its sign-up page on 30,000 potential customers would have been stopped by our method on the first day before 100 observations. 

Speaker

Dae Woong Ham, University of Michigan