Aligning Large Language Models with Heterogeneous Human Preferences: How Statistics Helps LLMs
Monday, Aug 4: 10:35 AM - 10:55 AM
Topic-Contributed Paper Session
Music City Center
Aligning large language models (LLMs) with human preferences is essential for improving generative AI systems. However, the heterogeneity of human feedback—due to varying contexts, expertise, and individual preferences—presents significant challenges in reward learning. This talk presents a dual active learning framework for reinforcement learning from human feedback (RLHF), which efficiently selects both conversations and teachers based on a D-optimal design. This strategy improves the reward learning process by minimizing generalized variance and optimizing the use of available feedback. Through theoretical analysis and extensive experiments, we demonstrate that our methods achieve superior alignment of LLMs with diverse human preferences.
Large language models
Optimal design
Reinforcement learning from human feedback
You have unsaved changes.