Monday, Aug 4: 10:30 AM - 12:20 PM
0520
Invited Paper Session
Music City Center
Room: CC-201B
Applied
Yes
Main Sponsor
Section on Statistics in Defense and National Security
Presentations
Large language models (LLMs) are recognized for their confident assertions, particularly in mathematical contexts, which can occasionally lead to incorrect conclusions. To address this challenge and enhance the reliability of quantitative answers, we present an LLM agent to effectively answer analytical questions and interact with diverse datasets. This tool integrates an LLM with a code interpreter in a secure, sandboxed environment. The LLM generates code to effectively answer analytical questions, then the code is executed to provide accurate and reliable results.
To ensure confidence in the outputs, the tool provides the generated code, allowing users to verify the correctness of the calculations independently. Additionally, users can generate accompanying visualizations, to support findings and verify data insights. By combining LLMs with code execution capabilities, our LLM agent empowers users to quickly and reliably derive meaningful insights from their datasets.
Keywords
Large language model (LLM)
AI for Data Analysis
While artificial intelligence (AI) has been a prominent modeling technique for decades, a paradigm shift has emerged more recently with a focus on training foundation models. Unlike predecessor AI models which are defined as narrow AI, i.e. algorithms designed for a single specific task or application, foundation models are capable of a variety of tasks and, although sometimes suboptimal on a specific desired task, can often be retrained or fine-tuned quickly to increase performance. In this talk, we will review the development of multiple unimodal and multimodal large language models (LLMs) for scientific and defense applications, discuss strategies for training with limited compute, the challenges of alignment (both across data sources and with human intent), how to incorporate statistics into your LLM pipeline, and how to make the results accessible and trustworthy for human interaction, all with a focus on how to accelerate the process of deploying new models.
Keywords
Artificial Intelligence
Large Language Model
Foundation Model
SysChat is a Retrieval Augmented Generation (RAG) tool that combines information retrieval, black-box large language models (LLMs), and expert feedback to answer user questions on mechanical systems. In retrieval-augmented generation, an embedding model first stores all documents in a vector database—in our case, tens of thousands of pages of complex systems documents. These embeddings are then used to identify relevant information for each query, guiding black-box LLM responses with improved factual accuracy and traceable information sources. Experts were then given access to this tool, and their feedback was used to train auxiliary methods that guide LLM outputs towards expert-preferred responses. This talk will discuss SysChat's architecture, highlighting classical and modern RAG techniques, LLM enhancements to improve reasoning capabilities, and the integration of expert feedback to guide black-box LLM generation.
Keywords
LLM
NLP
Information Retrieval
Generative AI
Speaker
Robert Molloy, Johns Hopkins University Applied Physics Laboratory
Foundation models have had a profound impact on society, through models such as OpenAI's chatGPTs and Anthropic's Claude series, as well as science, through models such as AlphaFold, Climax, and Aurora. While these models can produce impressive output, less focus has been spent on model evaluation than model building. In this talk I will discuss some of the challenges that make testing and evaluating large models difficult and efforts to more systematically evaluate them, such as uncertainty quantification on metrics and predictions, holistic metrics that go beyond leaderboardism, where models are ranked and compared with a single value, and deterministic evaluation of an LLM's output probability distribution.
Keywords
Testing and evaluation
uncertainty quantification
AI models