Empowering Data Science through Code Modernization: Bridging the Gap between Innovation and Efficiency
Conference: Symposium on Data Science and Statistics (SDSS) 2024
06/05/2024: 10:35 AM - 11:00 AM EDT
Refereed
Robust data systems are crucial for data science and statistics. As data complexity and volume increase, code modernization is essential for maintaining efficient, scalable, and sustainable data infrastructure, particularly for agencies reliant on data-driven decision-making. This involves updating and optimizing the codebase supporting data operations and processes. Traditional methods may struggle with large datasets, necessitating modern techniques like distributed computing. Additionally, the emergence of new data sources and formats demands adaptable coding practices for efficient processing. Technological innovation requires agile and responsive data infrastructure. Modernization allows for the incorporation of the latest tools, fostering code that is easier to understand and scale. This agility is crucial for organizations to adapt quickly to changing requirements and trends.
Our case study with the Economic Research Service (ERS) within the USDA exemplifies code modernization. ERS's data products, critical for agricultural policies, are built on outdated code. We advocate a modular programming approach, dividing data processing into distinct functions for independent development and maintenance. This approach, complemented by best practices like unit testing and version control, enhances the code's robustness and maintainability. By rewriting the code as an R package and implementing modern practices, we significantly improved the maintainability and adaptability of the data processing code. This simplifies updates and maintenance while also ensuring accurate and current documentation. Code modernization within data infrastructure is imperative for organizations to remain efficient and responsive in a data-driven world. The ERS case demonstrates the benefits of adopting a modular approach and best practices, essential for leveraging the full potential of data assets and meeting evolving demands.
Code Modernization
Data Infrastructure
Presenting Author
Roy McKenzie, Coleridge Initiative
First Author
Roy McKenzie, Coleridge Initiative
CoAuthor(s)
Nathan Barrett
Ahu Yildirmaz
Tracks
Practice and Applications
Symposium on Data Science and Statistics (SDSS) 2024
You have unsaved changes.