Reducing False Discovery Rates for A/B Experiments in Google Cloud
Monday, Aug 4: 10:05 AM - 10:20 AM
1206
Contributed Papers
Music City Center
Google Cloud uses A/B testing for launch decisions, relying on A/A tests to validate the A/B testing infrastructure. A key metric is initial page load latency, or the amount of time it takes each page to load all elements from start to finish. A series of A/A experiments revealed unexpectedly high false discovery rates (FDR) at the page-path level, even after applying common corrections such as Bonferroni adjustment. Drawing from genomics methodologies, we derived a new significance threshold using permutation tests. We randomly assigned users to "treatment" and "control" groups, calculated p-values for the 75th percentile latency nonparametrically, sorted all p-values, recorded the 1,000 smallest, and repeated this 10,000 times. This yielded the minimum p-value where the cumulative distribution function approached 0.05, returning FDRs to expected levels. We also evaluated the trade-off between significance thresholds and power by injecting hypothetical lifts. This solution was implemented in Google's internal A/B experiment tools.
online A/B experimentation
false discovery rate (FDR)
permutation testing
power analysis
Google Cloud
multiple comparisons
Main Sponsor
Business and Economic Statistics Section
You have unsaved changes.