Contributed Poster Presentations: Section for Statistical Programmers and Analysts
Ryan Peterson
Chair
University of Colorado - Anschutz Medical Campus
Wednesday, Aug 7: 10:30 AM - 12:20 PM
6071
Contributed Posters
Oregon Convention Center
Room: CC-Hall CD
Main Sponsor
Section for Statistical Programmers and Analysts
Presentations
This abstract highlights the significance of utilizing Household Pulse Survey data to comprehensively evaluate COVID-19's impact on the U.S. population. The project involves creating an R package using tools like usethis, Rtools, and R Studio. The methodology includes downloading CSV files from the Census website, optimizing storage via Parquet conversion, and storing on Github. Files are transformed into .rda datasets, resulting in an open-source package on GitHub for the data science community. Converting CSV data to Parquet format reduces file size without compromising integrity. Parquet files are then transformed into .rda datasets, aligning with R's native format, all managed in R Studio for a streamlined, reproducible workflow. The package encapsulates diverse insights from 63 transformed datasets, enhancing accessibility and efficiency for researchers. The optimized pipeline contributes to storage and retrieval efficiency, and the GitHub repository promotes collaboration. The discussion emphasizes the open-source contribution's role in advancing data science, with gains from CSV to Parquet conversion, .rda dataset adoption, and community engagement through the GitHub.
Keywords
R Package Development
Household Pulse Survey
COVID-19 Impact
Open Source contribution
Abstracts
You have unsaved changes.