Storing, Importing, Managing, and Analyzing Large Data Locally with R
Kelly Bodwin
Instructor
California Polytechnic State University
Monday, Aug 4: 8:30 AM - 5:00 PM
CE_14
Professional Development Course/CE
Music City Center
Room: CC-109
It is increasingly common in academic and professional settings to encounter datasets large enough to exceed the capabilities of standard data processing tools, yet small enough to be stored on local computers. Recent articles even claim that "the era of big data is over" and that data analysts and researchers should "think small, develop locally, ship joyfully." Such "medium" dataests are instrumental in measuring, tracking, and recording a wide array of phenomena across disciplines such as human behavior, animal studies, geology, economics, and astronomy. In this workshop, we will present modern techniques for handling large local data in R using a tidy data pipeline, encompassing stages from data storage and importing to cleaning, analysis, and exporting data and analyses. Specifically, we will teach a combination of tools from the data.table, arrow, and duckDB packages, with a focus on parquet data files for storage and transfer. By the end of the workshop, participants will understand how to integrate these tools to establish a legible, reproducible, efficient, and high-performance workflow.
You have unsaved changes.