Effortless Access to Household Pulse Survey Datasets with {hpsr}: An R Data Package

Christopher Clark Co-Author
Florida International University
 
Krupa Patel Co-Author
Florida International University
 
Rachel Clarke Co-Author
Florida International University
 
Nana Aisha Garba Co-Author
Florida International University
 
Prasad Bhoite First Author
 
Prasad Bhoite Presenting Author
 
Wednesday, Aug 7: 10:30 AM - 12:20 PM
2616 
Contributed Posters 
Oregon Convention Center 
This abstract highlights the significance of utilizing Household Pulse Survey data to comprehensively evaluate COVID-19's impact on the U.S. population. The project involves creating an R package using tools like usethis, Rtools, and R Studio. The methodology includes downloading CSV files from the Census website, optimizing storage via Parquet conversion, and storing on Github. Files are transformed into .rda datasets, resulting in an open-source package on GitHub for the data science community. Converting CSV data to Parquet format reduces file size without compromising integrity. Parquet files are then transformed into .rda datasets, aligning with R's native format, all managed in R Studio for a streamlined, reproducible workflow. The package encapsulates diverse insights from 63 transformed datasets, enhancing accessibility and efficiency for researchers. The optimized pipeline contributes to storage and retrieval efficiency, and the GitHub repository promotes collaboration. The discussion emphasizes the open-source contribution's role in advancing data science, with gains from CSV to Parquet conversion, .rda dataset adoption, and community engagement through the GitHub.

Keywords

R Package Development

Household Pulse Survey

COVID-19 Impact

Open Source contribution 

Main Sponsor

Section for Statistical Programmers and Analysts