Tree-Transformers: Improving Tabular Deep Learning by Integrating Random Forests and Transformers
Wednesday, Aug 6: 8:30 AM - 10:20 AM
Topic-Contributed Paper Session
Music City Center
Deep Learning (DL) models excel in domains with unstructured data, such as text and images, but underperform tree-based ensembles like Random Forests (RFs) on tabular data. Recent studies attribute this gap to three key limitations: (1) inability to adapt to sparsity, (2) excessive bias toward smooth solutions, and (3) reliance on rotationally invariant representations, which do not align with real-world data. To address these challenges, we propose Tree-Transformers (TTs), a novel architecture that integrates RFs with transformers. TTs first grow a random forest and extract node-based features from each tree. A transformer is then trained on these representations. To enhance computational efficiency, we employ a mixture-of-experts model that dynamically routes test examples to the most relevant tree-transformer at inference time. Our experiments demonstrate that TTs effectively mitigate the inductive biases of DL models and achieve state-of-the-art performance on real-world tabular benchmarks.
You have unsaved changes.