“Big-ish” Data in R: Efficient tools for large in-memory datasets

Abstract Number:

1854 

Submission Type:

Topic-Contributed Paper Session 

Participants:

Kelly Bodwin (1), Michael Chirico (2), Kelly Bodwin (1), Toby Hocking (3), Nic Crane (4), Tyson Barrett (5)

Institutions:

(1) California Polytechnic State University, N/A, (2) Google, N/A, (3) Northern Arizona University, N/A, (4) Voltron Data, N/A, (5) Utah State University, N/A

Chair:

Kelly Bodwin  
California Polytechnic State University

Discussant:

Michael Chirico  
Google

Session Organizer:

Kelly Bodwin  
California Polytechnic State University

Speaker(s):

Toby Hocking  
Northern Arizona University
Nic Crane  
Voltron Data
Tyson Barrett  
Utah State University

Session Description:

Between the small datasets of classical statistical analysis and the massive databases of distributed systems lies "big-ish" data: datasets that can be read directly into R on a personal computer, but that are large enough to make common data operations slow. This session highlights recent work in developing and testing R tools designed to speed up analysis of such large in-memory datasets, such as {arrow}, {data.table}, and {vroom}. We will share insights into the design, development, and maintenance of such tools; as well as examples of their use in real-world applications.

Sponsors:

No Additional Sponsor 3
Section for Statistical Programmers and Analysts 2
Section on Statistical Computing 1

Theme: Statistics and Data Science: Informing Policy and Countering Misinformation

Yes

Applied

Yes

Estimated Audience Size

Medium (80-150)

I have read and understand that JSM participants must abide by the Participant Guidelines.

Yes

I understand and have communicated to my proposed speakers that JSM participants must register and pay the appropriate registration fee by June 1, 2024. The registration fee is nonrefundable.

I understand