Aggregation of Rankings from Crowds of Referees

Jianqing Fan Speaker
Princeton University
 
Wednesday, Aug 6: 10:35 AM - 12:10 PM
Invited Paper Session 
Music City Center 
Machine Learning and AI conferences receive over ten thousand submissions, and this burdens the referee system significantly and impacts the quality of reviews with huge individual noises. It is reported that about half of the accepted papers in NeurIPS 2021 would be rejected upon a second round of reviews. This talk aims to develop a statistical framework that aggregates the preferences of tens of thousands of reviewers to come up with a better assessment of the quality of submitted papers. Specifically, each referee provides rankings among the papers she reviews. These rankings provide information about the preference scores or quality of papers under comparisons through the commonly used Bradley-Terry-Luce (BTL) type of models and can be aggregated through a spectral method. Theoretically, we study the performance of the spectral method in the estimation and uncertainty quantification of the unobserved preference scores in a general setup that the comparison graph consists of hyper-edges of possible heterogeneous sizes. In the scenarios where the BTL or PL models are appropriate, we unravel the relationship between the spectral estimator and the Maximum Likelihood Estimator (MLE) and discover that a two-step spectral method, applying the optimal weighting estimated from the vanilla spectral method, can achieve the same asymptotic efficiency as the MLE. Furthermore, we also introduce a comprehensive framework to carry out both one-sample and two-sample ranking inferences. Finally, we substantiate our findings via comprehensive numerical simulations and statistical inferences for rankings of statistical journals and movies.
(Joint work with Zhipeng Lou, Weichen Wang, and Mengxin Yu)

Keywords

Preference of Choice Model

Spectral Methods

Maximum likelihood

High-dimensional Inference

Inferences on Ranks

Top choices from heterogeneous sizes