Print Close

Improving Statistical Power of Classifier Evaluation with Limited Labels

Presented During: How Product Thinking Shapes Methodological Innovation at Google

Yi Liu Speaker
Google

Wednesday, Aug 6: 11:15 AM - 11:35 AM
Topic-Contributed Paper Session

Music City Center

At YouTube, we continuously develop classifiers to detect content that violates our community guidelines. However, comparing the performance between classifiers is challenging because of limited human labels.

In this talk, we discuss two approaches to increase the statistical power for detecting classifier improvements: a) Paired data sampling to maximize the information contained in human labels, and b) using proxy metrics that have higher sensitivity in the evaluation task. With these improvements, we are able to significantly boost our efficiency in evaluating abuse classifiers.