Reproducible Research Bootcamp

Abstract Number:

1499 

Submission Type:

Professional Development Course/CE  

Participants:

Aaron Williams (1)

Institutions:

(1) N/A, N/A

Primary Instructor:

Aaron Williams  
N/A

Description:

A reproducible research workflow should generate the same results from the same inputs every time. Unfortunately, software changes, key documentation is skipped, and that harddrive from graduate school disappeared during that last move. Reproducible research should be a minimum expectation of computational science, but too many researchers lack the tools to embrace a fully reproducible workflow. This full-day course aims to equip researchers with fundamental tools for reproducible research. The course will introduce Quarto, Git and GitHub, coding best practices, and environment management with renv through hands-on exercises and clear resources. Attendees will leave equipped to weather constantly changing software versions, documentation will be too fun to skip, and even a missing harddrive won't ruin years of work. The course focuses on R but the content is broadly applicable.

Instructor Background:

Aaron R. Williams is a senior data scientist at the Urban Institute where he works on using modern data privacy techniques to safely expand access to data for research and data imputation methods. He also leads the Urban Institute R Users Group and consults across the institute on projects that use statistical computing. Williams is an adjunct professor in the McCourt School of Public Policy at Georgetown University where he teaches two original classes focused on data science for public policy. Williams is an instructor for the ASA Council of Chapter's traveling course series.

Course Outline:

Intro - Importance and properties of reproducible research: We will motivate the importance of reproducible research with examples where reproducible tools were adopted for good and examples where the absence of these tools created major errors and headaches for authors.
Part 1 - Literate Statistical Programming with Quarto: We will introduce the literate statistical programming tool Quarto. We will create websites and PDFs, cover output and execution options, and work with Zotero to easily manage citations.
Part 2 - Version Control with Git and GitHub: We will work through a hands-on introduction to Git with the command line and GitHub. We will use Quarto and GitHub pages to host a free website online.
Part 3 - Coding Best Practices: We will work through examples of writing tests and checks with R to ensure the quality, stability, and reproducibility of results.
Part 4 - Environment management with renv: We will adopt the tool renv to manage software and package versions over t

Learning Outcomes:

The objective of this course is to give attendees a robust toolkit for creating accurate, clear, and reproducible research. Attendees should be able to create a research project where the environment is managed with renv, the code is version controlled with Git, the code is tested and stored in a beautiful notebook with Quarto, and the final product is hosted for free online on GitHub pages.

Sponsors:

No Additional Sponsor 3
No Additional Sponsor 1

Do you need additional equipment for your course?

No

Length of Course (pick 1)

Full Day Course