Automated Statistical Model Discovery with Large Language Models
Michael Li
Speaker
Department of Computer Science, Stanford University
Wednesday, Aug 6: 9:35 AM - 9:55 AM
Topic-Contributed Paper Session
Music City Center
Statistical models are powerful tools for helping us understand and explain the world. Building these models is challenging because it requires deep expertise in modeling and the problem domain. Motivated by the capabilities of large language models (LLMs), we introduce a framework for language model driven automated statistical model discovery. We cast our automated procedure within Box's Loop: the LM iterates between proposing statistical models represented as probabilistic programs, acting as a modeler, and critiquing those models through simulation-based checks, acting as a domain expert. By integrating LMs, we do not have to define a domain-specific language of models or design a handcrafted search procedure over models. We evaluate our method on a range of controlled settings and real-world tasks. Our method identifies models on par with human expert designed models and extends classic models in interpretable ways. Our results highlight the promise of LM-driven model discovery.
You have unsaved changes.