“Big-ish” Data in R: Efficient tools for large in-memory datasets
Abstract Number:
1854
Submission Type:
Topic-Contributed Paper Session
Participants:
Kelly Bodwin (1), Michael Chirico (2), Kelly Bodwin (1), Toby Hocking (3), Nic Crane (4), Tyson Barrett (5)
Institutions:
(1) California Polytechnic State University, N/A, (2) Google, N/A, (3) Northern Arizona University, N/A, (4) Voltron Data, N/A, (5) Utah State University, N/A
Chair:
Discussant:
Session Organizer:
Speaker(s):
Session Description:
Between the small datasets of classical statistical analysis and the massive databases of distributed systems lies "big-ish" data: datasets that can be read directly into R on a personal computer, but that are large enough to make common data operations slow. This session highlights recent work in developing and testing R tools designed to speed up analysis of such large in-memory datasets, such as {arrow}, {data.table}, and {vroom}. We will share insights into the design, development, and maintenance of such tools; as well as examples of their use in real-world applications.
Sponsors:
No Additional Sponsor 3
Section for Statistical Programmers and Analysts 2
Section on Statistical Computing 1
Theme:
Statistics and Data Science: Informing Policy and Countering Misinformation
Yes
Applied
Yes
Estimated Audience Size
Medium (80-150)
I have read and understand that JSM participants must abide by the Participant Guidelines.
Yes
I understand and have communicated to my proposed speakers that JSM participants must register and pay the appropriate registration fee by June 1, 2024. The registration fee is nonrefundable.
I understand
You have unsaved changes.