12. Performance Testing and Comparative Benchmarking for Creating a Self-Sustaining Ecosystem for data.table

Conference: Women in Statistics and Data Science 2024
10/16/2024: 4:00 PM - 5:00 PM EDT
Speed 

Description

The data.table package in R is a powerful tool for data analysis, combining efficient C code with user-friendly R syntax. To ensure its long-term sustainability, the NSF POSE program has funded a project from 2023 to 2025 to build a self-sustaining ecosystem around data.table.

In this presentation, we will discuss the importance of performance testing in the development of data.table and present a general approach that can be applied to other R packages. By creating performance tests based on historical regressions, we can measure the package's efficiency over time and memory usage, ensuring that code and version releases do not impact its performance. We will demonstrate the use of the atime package to benchmark execution time and memory usage, providing developers with confidence in maintaining efficient performance and reliability. This approach not only benefits data.table but also serves as a model for other R package developers to enhance the performance and popularity of their own projects.

Presenting Author

Doris Amoakohene

First Author

Doris Amoakohene

Target Audience

Beginner

Tracks

Community
Women in Statistics and Data Science 2024