Trustworthy Data Science

Alyssa Columbus Chair
Johns Hopkins University
 
Rohan Alexander Organizer
 
Sunday, Aug 4: 4:00 PM - 5:50 PM
1028 
Invited Paper Session 
Oregon Convention Center 
Room: CC-253 

Applied

Yes

Main Sponsor

SSC (Statistical Society of Canada)

Co Sponsors

Canadian Statistical Sciences Institute
Caucus for Women in Statistics
Section on Statistical Learning and Data Science

Presentations

Integrating code testing into the data science workflow to enhance trustworthiness

Code now plays a central role in much of statistical analysis, especially in data science. But few data scientists or statisticians have foundational software engineering skills, and there are complications in data science that are mean some of these skills are not directly transferable in any case. We build one way of integrating testing into a data science workflow, especially focused on statistical modeling, and then show how Large Language Models (LLMs) can be used to automate aspects of this code testing suite. Our workflow and approach enhances the trustworthiness of conclusions from data. 

Speaker

Rohan Alexander

Presenting the Analytic Process: Beyond Reproducibility

A significant trend in data analysis over the past 20 years has been the efforts at promoting computational transparency and reproducibility. These efforts have had many benefits, including the wide dissemination of code and datasets that can be used for both verification and extension with new analyses. However, a question remains as to whether computational reproducibility is a useful indicator of the trustworthiness of a data analysis. While reproducible analyses can be checked more easily for problems or errors, a heavy burden is placed on others to similarly reproduce the time and resources to execute the analytic code. We argue that reproducibility, while useful as a minimum standard for trustworthiness, is not sufficient and that other formats for presenting and distributing data analyses should be considered. We borrow ideas from systems engineering and demonstrate some of these techniques through case studies. 

Speaker

Roger Peng, University of Texas, Austin

Societal Pressures for Trustworthy Data Analysis

Data analysis is increasingly being performed on a wider-scale than ever before. The widening of scale includes the number of people doing data analysis, as well as the application of analysis to many different problems and fields. A challenge of this phenomena is that many people now performing data analysis may not have formal "data" training. Additionally, the applications of data analysis now routinely having wide and important impacts to human lives (e.g., loan qualification, health treatment access, etc). That such analyses be trustworthy seems important, but also challenging to ensure. This talk explores what societal pressures (moral, reputational and institutional) currently exist to promote trustworthy data analyses and whether they are sufficient to pressure data analysts to perform analysis that is "trustworthy enough". The aim of surveying this landscape is to understand the current state of affairs and identify where there may not be enough societal pressures to induce analysts into making their analyses trustworthy. Identifying the gaps will be an important starting point for the data analyst community to improve community standards and norms around trust. 

Speaker

Tiffany Timbers, University of British Columbia

Unmasking the Ivory Tower: Bias, Privilege, and Objectivity in Canadian Physical Sciences

For decades, Canadian Universities have proclaimed to be vestiges of acceptance where all can be successful. Specifically, in physical sciences objectivity is revered; upheld as the great equalizer leading us to innovation and is foundational to discovery. However, objectivity has perhaps prevented open commentary about the human aspects of science. Bias is inherent in us all, and privilege has shaped who we deem worthy to hold the title of scientist. This talk will briefly review:
1) historical discrimination and the ways systemic biases impact access to and success in Academia
2) the barriers to success which are present in higher education and challenge us to consider the impact on current and future scientists
3) considerations for ethical sociodemographic data collection and complementary qualitative data  

Speaker

Evelyn Asiedu, Thompson Rivers University