Improving Statistical Power of Classifier Evaluation with Limited Labels

Yi Liu Speaker
Google
 
Wednesday, Aug 6: 11:15 AM - 11:35 AM
Topic-Contributed Paper Session 
Music City Center 
At YouTube, we continuously develop classifiers to detect content that violates our community guidelines. However, comparing the performance between classifiers is challenging because of limited human labels.

In this talk, we discuss two approaches to increase the statistical power for detecting classifier improvements: a) Paired data sampling to maximize the information contained in human labels, and b) using proxy metrics that have higher sensitivity in the evaluation task. With these improvements, we are able to significantly boost our efficiency in evaluating abuse classifiers.