Estimating Causal Relationships in Complex Systems from Tabular Data Using Language Models
Sunday, Aug 3: 4:05 PM - 4:25 PM
Topic-Contributed Paper Session
Music City Center
Large language models (LLMs) are increasingly being applied in scientific research due to their advanced reasoning capabilities. For instance, multi-modal LLMs can process diverse data types as inputs, expanding their utility across various domains. However, while traditional causality methods primarily focus on tabular data, existing language models are largely limited to inferring causal relationships from textual data. In this work, we leverage the powerful reasoning capabilities of language models to infer and discover causal relationships directly from tabular data. The proposed framework utilizes the Mamba (State Space Model) language model architecture with added layers for classification tasks. To ensure the framework's robustness and generalizability, we incorporate a diverse range of simulation data and 10 curated real-world datasets into the training procedure. Furthermore, our framework is designed to be extensible, enabling users to easily integrate their data and additional scores and tests. Our results demonstrate that the proposed causal framework outperforms existing methods in terms of accuracy. Additionally, the framework is designed to be extensible, allowing users to incorporate their data for further customization and application.
Large language model
Causal inference
Causal discovery
Tubular data
Complex causal systems
You have unsaved changes.