Reducing False Discovery Rates for A/B Experiments in Google Cloud

Taylor Mattia First Author
Google
 
Taylor Mattia Presenting Author
Google
 
Monday, Aug 4: 10:05 AM - 10:20 AM
1206 
Contributed Papers 
Music City Center 
Google Cloud uses A/B testing for launch decisions, relying on A/A tests to validate the A/B testing infrastructure. A key metric is initial page load latency, or the amount of time it takes each page to load all elements from start to finish. A series of A/A experiments revealed unexpectedly high false discovery rates (FDR) at the page-path level, even after applying common corrections such as Bonferroni adjustment. Drawing from genomics methodologies, we derived a new significance threshold using permutation tests. We randomly assigned users to "treatment" and "control" groups, calculated p-values for the 75th percentile latency nonparametrically, sorted all p-values, recorded the 1,000 smallest, and repeated this 10,000 times. This yielded the minimum p-value where the cumulative distribution function approached 0.05, returning FDRs to expected levels. We also evaluated the trade-off between significance thresholds and power by injecting hypothetical lifts. This solution was implemented in Google's internal A/B experiment tools.

Keywords

online A/B experimentation

false discovery rate (FDR)

permutation testing

power analysis

Google Cloud

multiple comparisons 

Main Sponsor

Business and Economic Statistics Section