Improving Statistical Power of Classifier Evaluation with Limited Labels
Wednesday, Aug 6: 11:15 AM - 11:35 AM
Topic-Contributed Paper Session
Music City Center
At YouTube, we continuously develop classifiers to detect content that violates our community guidelines. However, comparing the performance between classifiers is challenging because of limited human labels.
In this talk, we discuss two approaches to increase the statistical power for detecting classifier improvements: a) Paired data sampling to maximize the information contained in human labels, and b) using proxy metrics that have higher sensitivity in the evaluation task. With these improvements, we are able to significantly boost our efficiency in evaluating abuse classifiers.
You have unsaved changes.