“Big-ish” Data in R: Efficient tools for large in-memory datasets

Kelly Bodwin Chair
California Polytechnic State University
 
Michael Chirico Discussant
Google
 
Kelly Bodwin Organizer
California Polytechnic State University
 
Tuesday, Aug 6: 2:00 PM - 3:50 PM
1854 
Topic-Contributed Paper Session 
Between the small datasets of classical statistical analysis and the massive databases of distributed systems lies "big-ish" data: datasets that can be read directly into R on a personal computer, but that are large enough to make common data operations slow. This session highlights recent work in developing and testing R tools designed to speed up analysis of such large in-memory datasets, such as {arrow}, {data.table}, and {vroom}. We will share insights into the design, development, and maintenance of such tools; as well as examples of their use in real-world applications.

Applied

Yes

Main Sponsor

Section on Statistical Computing

Co Sponsors

Section for Statistical Programmers and Analysts

Presentations

Presentation

Speaker

Toby Hocking, Northern Arizona University

Presentation

Speaker

Nic Crane, Voltron Data

Presentation

Speaker

Tyson Barrett, Utah State University