Automating Codebase Translation from SAS to Python with LLMs
Monday, Aug 4: 3:35 PM - 3:50 PM
1400
Contributed Papers
Music City Center
Code translation from SAS to Python remains a challenging effort for organizations migrating their codebases. Classical rules-based methods like Abstract Syntax Trees rely on handcrafted rules that can be time-consuming and inflexible. Unsupervised learning approaches have shown improvements but require massive parallel data for training which is unavailable for SAS and Python. Large Language Models (LLMs) overcome these barriers through parametric knowledge retrieval and offer more promising results despite diverse quality issues (syntax and semantic errors). This presentation explores various strategies for automating SAS to Python translation on complex codebases. We discuss managing context window limitations, nested dependencies, incorporating rules-based approaches, and reducing laziness over tedious code. We also detail specific challenges when adapting SAS to Python such as sentinel values, vectorized operations, and adapting macros. This presentation highlights practical approaches for migrating proprietary software to open-source languages more quickly, reducing resource burden on organizations while preserving critical business logic.
Large Language Models (LLMs)
Code Translation
Federal Statistics
Natural Language Processing
Main Sponsor
Section on Statistical Consulting
You have unsaved changes.