Print Close

Automated Statistical Model Discovery with Large Language Models

Presented During: Large Language Models in Biomedical and Statistical Knowledge Discovery

Michael Li Speaker
Department of Computer Science, Stanford University

Wednesday, Aug 6: 9:35 AM - 9:55 AM
Topic-Contributed Paper Session

Music City Center

Statistical models are powerful tools for helping us understand and explain the world. Building these models is challenging because it requires deep expertise in modeling and the problem domain. Motivated by the capabilities of large language models (LLMs), we introduce a framework for language model driven automated statistical model discovery. We cast our automated procedure within Box's Loop: the LM iterates between proposing statistical models represented as probabilistic programs, acting as a modeler, and critiquing those models through simulation-based checks, acting as a domain expert. By integrating LMs, we do not have to define a domain-specific language of models or design a handcrafted search procedure over models. We evaluate our method on a range of controlled settings and real-world tasks. Our method identifies models on par with human expert designed models and extends classic models in interpretable ways. Our results highlight the promise of LM-driven model discovery.