Print Close

Integrating code testing into the data science workflow to enhance trustworthiness

Presented During: Trustworthy Data Science

Rohan Alexander Speaker

Sunday, Aug 4: 4:05 PM - 5:30 PM
Invited Paper Session

Oregon Convention Center

Code now plays a central role in much of statistical analysis, especially in data science. But few data scientists or statisticians have foundational software engineering skills, and there are complications in data science that are mean some of these skills are not directly transferable in any case. We build one way of integrating testing into a data science workflow, especially focused on statistical modeling, and then show how Large Language Models (LLMs) can be used to automate aspects of this code testing suite. Our workflow and approach enhances the trustworthiness of conclusions from data.