Lessons Learned from 1,000 Data Science Projects

Shannon Ellis Speaker
UC San Diego
 
Thursday, Aug 8: 9:25 AM - 9:50 AM
Invited Paper Session 
Oregon Convention Center 
The process of Building Better Data Analyses often begins in the classroom, and increasingly, due to growing enrollments, this is happening at scale. Since 2018, we have taught COGS 108 Data Science in Practice at UC San Diego every single term to 400+ students at a time. This large-enrollment, project-based course aims to teach the critical skills needed to pursue a technical data-focused career. Throughout this course, students complete a term-long group data science project on a topic of the students' choosing. Groups carry out the entire data science process: formulating a question; finding, cleaning and analyzing data; answering their question of interest; and finally, communicating their process and findings in both a detailed, technical data science report and short, oral presentation. Having advised 4,000 students through more than 1,000 projects, we summarize the key lessons we've learned in how to teach using and analyzing data at scale. Our findings highlight the importance of clear instruction, project scaffolding, regular checkpoints, detailed & project-specific feedback, and careful consideration of the technical stack used.