Aligning Large Language Models with Heterogeneous Human Preferences: How Statistics Helps LLMs

Will Wei Sun Speaker
Purdue University
 
Monday, Aug 4: 10:35 AM - 10:55 AM
Topic-Contributed Paper Session 
Music City Center 
Aligning large language models (LLMs) with human preferences is essential for improving generative AI systems. However, the heterogeneity of human feedback—due to varying contexts, expertise, and individual preferences—presents significant challenges in reward learning. This talk presents a dual active learning framework for reinforcement learning from human feedback (RLHF), which efficiently selects both conversations and teachers based on a D-optimal design. This strategy improves the reward learning process by minimizing generalized variance and optimizing the use of available feedback. Through theoretical analysis and extensive experiments, we demonstrate that our methods achieve superior alignment of LLMs with diverse human preferences.

Keywords

Large language models

Optimal design

Reinforcement learning from human feedback