Thursday, Aug 8: 8:30 AM - 10:20 AM
1436
Invited Paper Session
Oregon Convention Center
Room: CC-C122
With the increased digitization of our lives, more and more adaptive algorithms, including reinforcement learning
algorithms, are used for automated decision making and experimentation in a variety of areas, from online
advertising and recommendation systems, to political surveys and digital health. Adaptive learning algorithms,
characterized by their ability to learn and dynamically adjust randomization probabilities over time, offer a
powerful means of experimental design. Adaptive learning algorithms can be used to provide users with better,
more personalized digital experiences (minimize regret), as well as to optimize the precision of treatment effect
estimates (maximize power). However, standard statistical methods which assume treatments are assigned
independently fail to hold when adaptive learning algorithms are used, and standard normal approximations and
randomization test approaches can be invalid on this type of data. Moreover, there is a lack of a comprehensive
framework for statistical inference on data collected with adaptive learning algorithms that can be used for a
variety of adaptive designs (early stopping, policy learning, incorporating prediction models) to address various
statistical questions (hypothesis testing, evaluating an estimated optimal policy, etc.)
The primary goal of this session is to shed light on both the theoretical and practical challenges for
statistical inference for data collected with adaptive and reinforcement learning algorithms. These will include
practical aspects of executing adaptive experiments, specifically in internet companies and social sciences.
Additionally, the talks will discuss the statistical challenges associated with different types of experimental designs
and inferential questions, including a) early stopping, b) policy learning and evaluation from data collected with
adaptive algorithms, and c) observational data settings in which the treatment propensities are not available to the
data analyst.
Applied
Yes
Main Sponsor
Section on Statistical Learning and Data Science
Co Sponsors
Association for the Advancement of Artificial Intelligence
IMS
Presentations
This talk considers procedures for experiments with multiple treatment conditions, in which the experimenter wishes to use the experimental data to learn a policy for assigning treatment in the future: determining which of multiple treatments performed best, on average (best arm identification), or learning a contextual assignment regime, allowing that different treatment conditions may be best for different subgroups of the population with different covariate values (contextual policy learning). The problem requires the experimental planner to determine both experimental treatment allocation procedures, and assignment recommendation procedures using experimental data. Additionally, the experimenter may also like to not only learn, but obtain estimates of mean response under the learned policy. I discuss design considerations and applications to contextual settings where predictive contexts are known and unknown.
Randomized experiments have become the standard method for companies to evaluate the performance of new products or services. In addition to augmenting managers' decision-making, experimentation mitigates risk by limiting the proportion of customers exposed to innovation. Since many experiments are on customers arriving sequentially, a potential solution is to allow managers to peek'' at the results when new data becomes available and stop the test if the results are statistically significant. Our paper provides valid design-based confidence sequences, sequences of confidence intervals with uniform type-1 error guarantees over time for various sequential experiments in an assumption-light manner. In particular, we focus on finite-sample estimands defined on the study participants as a direct measure of the incurred risks by companies. Our proposed confidence sequences are valid for a large class of experiments, including multi-arm bandits, time series, and panel experiments. We further provide a variance reduction technique incorporating modeling assumptions and covariates.
Adaptive sampling algorithms are increasingly used in experimentation to quickly identify the best performing treatment or "winner". After running such an experiment, one is often interested in the difference in treatment effect (or difference in expected value) between the apparent "winning" treatment, as compared to a control or baseline treatment. It is well known that if one naively uses the same data to select the winning treatment and evaluate the value of the winning treatment, standard statistical approaches are invalid. Post-selection inference approaches have been developed for this inference on winners problem. However, these approaches primarily consider the i.i.d. data setting, and do not allow the data itself to be adaptively collected. In this talk, I introduce an inference approach for "winners" on adaptively collected data that utilizes randomization based methods. Our approach can be used to construct valid confidence intervals for the treatment effect between the "winner" and a baseline treatment.
We consider the problem of efficient inference of the Average Treatment Effect in a sequential experiment where the policy governing the assignment of subjects to treatment or control can change over time. We first provide a central limit theorem for the Adaptive Augmented Inverse-Probability Weighted estimator, which is semiparametric efficient, under weaker assumptions than those previously made in the literature. This central limit theorem enables efficient inference at fixed sample sizes. We then consider a sequential inference setting, deriving both asymptotic and nonasymptotic confidence sequences that are considerably tighter than previous methods. These anytime-valid methods enable inference under data-dependent stopping times (sample sizes). Additionally, we use propensity score truncation techniques from the recent off-policy estimation literature to reduce the finite sample variance of our estimator without affecting the asymptotic variance. Empirical results demonstrate that our methods yield narrower confidence sequences than those previously developed in the literature while maintaining time-uniform error control.
Speaker
Thomas Cook, University of Massachusetts, Amherst