Print Close

Harnessing Large Language Models: Opportunities and Challenges for Statistics

Linjun Zhang Chair
Rutgers University

Weijie Su Organizer
University of Pennsylvania

Qi Long Organizer

Tuesday, Aug 6: 2:00 PM - 3:50 PM
1770
Topic-Contributed Paper Session

Oregon Convention Center

Room: CC-E141

View Abstract 1770

Applied

Yes

Main Sponsor

Section on Statistical Learning and Data Science

Co Sponsors

IMS

Section on Nonparametric Statistics

Presentations

Aligning Large Language Models with Consensus between Preference and Policy

The rapid advancement of Large Language Models (LLMs) presents significant opportunities in the pursuit of artificial intelligence (AI), while simultaneously raising critical safety concerns. This necessitates the need for robust AI alignment strategies. Reinforcement Learning from Human Feedback (RLHF) has been identified as a promising approach for achieving AI alignment. However, a notable challenge with this method is its susceptibility to mode collapse, leading to a decrease in the diversity of model outputs. Our paper identifies a key issue in this regard: the algorithmic bias inherent in RLHF. We find that such an algorithmic bias is due to the inconsensus between the preference learning (step 2) and policy learning (step 3) stages in RLHF. To tackle this issue, we establish the necessary and sufficient conditions for achieving preference-policy consensus (PPC). We theoretically demonstrate that the global solutions in policy learning for both PPC-RLHF are aligned with the preference. By doing so, these approaches effectively counteract the algorithmic bias inherent in RLHF, thus facilitating a more equitable and well-aligned large language model.

Speaker

jiancong xiao, Department of Biostatistics and Epidemiology, University of Pennsylvania

On the Use of Bandits and Low-Rank Factorization to Speed up LLM-based Evaluation

Natural language generation has achieved such a high level of proficiency it has become very challenging to compare the performance of one language model to another. As traditional methods such as BLEU and ROUGE are too brittle, it is now common practice to depend upon another, often larger, language model either implicitly or explicitly to score and compare generations. Dependence upon large language models (LLMs) such as GPT4 to score generations is incredibly costly in terms of money, compute, and time. We aim to reduce the burden of these evaluations with respect to all three of these resources. First we observe that these evaluation matrices are intrinsically low rank, and well approximated by low rank factorizations. Further, we build upon the well studied multi arm bandit framework proposing a range of algorithms for selecting the best language model. The algorithms span from those with strong theoretical guarantees to those with empirically strong performance. We find our methods can typically identify the top performer with 5-15% of the typically required resources - that is an 85-95% percent.

Speaker

Ruihan Wu

Distribution-aware Pruning Strategy for Large Language Models: from Unstructured to Structured Pruning

Recent progress in artificial general intelligence has led to large language models (LLMs) with billions of parameters. This scale necessitates the removal of unnecessary neurons or weights through model pruning. Traditional pruning methods typically focus on the magnitude of weights in a deterministic manner. However, using weight magnitude is a local metric without considering how it affects the model globally, and deterministic pruning can introduce errors that accumulate across layers. Conversely, randomized pruning can help even out these errors across different layers. In this talk, we introduce two inference-aware pruning criteria derived from the optimization perspective of output approximation, which surpass traditional training-aware metrics such as gradient and Hessian. Moreover, we introduce a two-step reconstruction technique to mitigate pruning errors without model retraining. Our experimental results showcase the superior performance of this approach across various datasets and models, markedly reducing both computational costs and hardware requirements.