Uncertainty Quantification and Calibrating Confidence Scores in Language Models

Karl Pazdernik Speaker
Pacific Northwest National Laboratory
 
Tuesday, Aug 6: 11:55 AM - 12:15 PM
Invited Paper Session 
Oregon Convention Center 
With the popularity of ChatGPT and the ever-improving performance of generative pretrained transformers (GPTs), Large Language Models (LLMs) are being implemented in almost every information retrieval tool. Whether extracting specific entities from text (named entity recognition) or describing certain characteristics found within the text, LLMs are being tested for a wide variety of tasks. However, these models "hallucinate" facts with great confidence in their false results. While retrieval augmented generation (RAG) has shown the ability to reduce errors in the generated response, without explicitly quantifying the uncertainty in the result, human users of these systems are left to blindly trust the result. To this end, we review existing forms of uncertainty quantification for language models and highlight forms of calibrating a language model using methods such as Bayesian belief matching and conformal prediction. We end with discussions on the challenges when moving towards multimodal foundation models.