Design-Based Confidence Sequences: A General Approach to Risk Mitigation in Online Experimentation with Netflix
Sunday, Aug 3: 3:25 PM - 3:45 PM
Topic-Contributed Paper Session
Music City Center
Randomized experiments have become the standard method for companies to evaluate the performance of new products or services. Beyond aiding managerial decision-making, experiments mitigate risk by limiting the proportion of customers exposed to innovations. Since many experiments are conducted sequentially over time, an emerging strategy to further derisk the process is to allow managers to "peek" at the results as new data becomes available and stop the test if the results are statistically significant. The class of statistical methods that allow managers to peek and still provide valid inference are often called anytime-valid since they maintain proper uniform type-1 error guarantees. In this paper, we extend existing anytime-valid approaches to accommodate the more complex yet standard settings in time series, switchback, and panel experiments. To achieve this, we leverage the design-based approach to focus on assumption-light and managerial relevant finite-sample estimands defined on the study partic- ipants as a direct measure of the risks incurred by companies. As a special case, our results also provide a robust method for achieving always-valid inference in A/B tests. We further provide a variance reduction technique incorporating modeling assumptions and covariates. Finally, we demonstrate the effectiveness of our proposed approach through a simulation study and three real-world applications from Netflix. Our results show that by using our confidence sequence, harmful experiments could be stopped after only observing a handful of units; for instance, an experiment that Netflix ran on its sign-up page on 30,000 potential customers would have been stopped by our method on the first day before 100 observations.
You have unsaved changes.