Efficient tools for your tidy workflow

Tyson Barrett Speaker
Utah State University
 
Tuesday, Aug 6: 2:45 PM - 3:05 PM
Topic-Contributed Paper Session 
Oregon Convention Center 
Preparing "tidy" data is the process of cleaning, reshaping, and formatting data to consist of rectangular data with observations in rows and variables in columns. This format is often ideal for data analytics and statistical analysis. In R, the Tidyverse has a defined set of methods to help tidy the data that come with a grammar on how to communicate these methods. However, large data—data that have millions of rows but need to be worked on in-memory—are common, which can require other tools built for large data. In this talk, I will highlight a tidy workflow that uses the data.table R package using the grammar established by the Tidyverse. I will highlight how this package efficiently, concisely, and quickly tidy data, including grouped operations, aggregations, and pivoting on data with 10 million rows and 50 columns. This introduction will provide attendees with resources to start using these tools on their own large data and will highlight the benefits of incorporating data.table into their workflow.