12. Performance Testing and Comparative Benchmarking for Creating a Self-Sustaining Ecosystem for data.table
Conference: Women in Statistics and Data Science 2024
10/16/2024: 4:00 PM - 5:00 PM EDT
Speed
The data.table package in R is a powerful tool for data analysis, combining efficient C code with user-friendly R syntax. To ensure its long-term sustainability, the NSF POSE program has funded a project from 2023 to 2025 to build a self-sustaining ecosystem around data.table.
In this presentation, we will discuss the importance of performance testing in the development of data.table and present a general approach that can be applied to other R packages. By creating performance tests based on historical regressions, we can measure the package's efficiency over time and memory usage, ensuring that code and version releases do not impact its performance. We will demonstrate the use of the atime package to benchmark execution time and memory usage, providing developers with confidence in maintaining efficient performance and reliability. This approach not only benefits data.table but also serves as a model for other R package developers to enhance the performance and popularity of their own projects.
Presenting Author
Doris Amoakohene
First Author
Doris Amoakohene
Target Audience
Beginner
Tracks
Community
Women in Statistics and Data Science 2024
You have unsaved changes.