Estimating Causal Relationships in Complex Systems from Tabular Data Using Language Models

Zhenjiang Fan Speaker
Stanford University School of Medicine
 
Sunday, Aug 3: 4:05 PM - 4:25 PM
Topic-Contributed Paper Session 
Music City Center 
Large language models (LLMs) are increasingly being applied in scientific research due to their advanced reasoning capabilities. For instance, multi-modal LLMs can process diverse data types as inputs, expanding their utility across various domains. However, while traditional causality methods primarily focus on tabular data, existing language models are largely limited to inferring causal relationships from textual data. In this work, we leverage the powerful reasoning capabilities of language models to infer and discover causal relationships directly from tabular data. The proposed framework utilizes the Mamba (State Space Model) language model architecture with added layers for classification tasks. To ensure the framework's robustness and generalizability, we incorporate a diverse range of simulation data and 10 curated real-world datasets into the training procedure. Furthermore, our framework is designed to be extensible, enabling users to easily integrate their data and additional scores and tests. Our results demonstrate that the proposed causal framework outperforms existing methods in terms of accuracy. Additionally, the framework is designed to be extensible, allowing users to incorporate their data for further customization and application.

Keywords

Large language model

Causal inference

Causal discovery

Tubular data

Complex causal systems