Trustworthy data science

Abstract Number:

1028 

Submission Type:

Invited Paper Session 

Participants:

Rohan Alexander (1), Nicholas Horton (2), Rohan Alexander (1), Tiffany Timbers (3), Roger Peng (4)

Institutions:

(1) N/A, N/A, (2) Amherst College, N/A, (3) University of British Columbia, N/A, (4) Johns Hopkins Bloomberg SPH, N/A

Chair:

Nicholas Horton  
Amherst College

Session Organizer:

Rohan Alexander  
N/A

Speaker(s):

Rohan Alexander  
N/A
Tiffany Timbers  
University of British Columbia
Roger Peng  
Johns Hopkins Bloomberg SPH

Session Description:

Description:
As data science matures and is increasingly relied on for decisions that affect people, there is a need to ensure that the it can be trusted. This means establishing a principled, tested, reproducible, end-to-end workflow that focuses on quantitative measures in and of themselves, and as a foundation to explore questions. This panel explores what makes data science trustworthy from a variety of perspectives.

Focus:
We focus on different aspects of trustworthy data science including what makes a data analysis make it more or less trustworthy, the current state of pressures to perform trustworthy data analysis, and how we can make data analysis more trustworthy.

Content (including tentative presentation titles):
* Peng: What aspects of a data analysis make it more or less trustworthy?
* Timbers: The current state of the societal, moral, reputational and institutional pressures on performing data analysis in a trustworthy manner.
* Alexander: Automated code and data testing in data science.

Timeliness:
Data science is now ubiquitous in academia, industry, but for a long time it has been poorly defined. Over the past couple of years there has begun to be agreement. Trustworthy data science builds on this shared understanding to add rigour to data science. We have known for a long-time what rigour looks like in mathematical and statistical theory: theorems are accompanied by proofs. There is beginning to be agreement on what rigour looks like in data science: claims are accompanied by verified, tested, reproducible, code and data. The result is conclusions from data that can be trusted.

Appeal:
For some time we have known there is an issue with the credibility of findings in data science. For instance, the reproducibility crisis, which was identified early in psychology (Open Science Collaboration 2015) but since seen in many other disciplines, brought to light issues such as p-value "hacking", researcher degrees of freedom, file-drawer issues, and even data and results fabrication. Steps are being put in place to address these, but a broader approach is needed to ensure that these separate fixes all add up.

Sponsors:

Canadian Statistical Sciences Institute 2
Section on Statistical Learning and Data Science 3
SSC (Statistical Society of Canada) 1

Theme: Statistics and Data Science: Informing Policy and Countering Misinformation

Yes

Applied

Yes

Estimated Audience Size

Large (150-275)

I have read and understand that JSM participants must abide by the Participant Guidelines.

Yes

I understand and have communicated to my proposed speakers that JSM participants must register and pay the appropriate registration fee by June 1, 2024. The registration fee is nonrefundable.

I understand