Sunday, Aug 4: 4:00 PM - 5:50 PM
1028
Invited Paper Session
Oregon Convention Center
Room: CC-253
Applied
Yes
Main Sponsor
SSC (Statistical Society of Canada)
Co Sponsors
Canadian Statistical Sciences Institute
Caucus for Women in Statistics
Section on Statistical Learning and Data Science
Presentations
Code now plays a central role in much of statistical analysis, especially in data science. But few data scientists or statisticians have foundational software engineering skills, and there are complications in data science that are mean some of these skills are not directly transferable in any case. We build one way of integrating testing into a data science workflow, especially focused on statistical modeling, and then show how Large Language Models (LLMs) can be used to automate aspects of this code testing suite. Our workflow and approach enhances the trustworthiness of conclusions from data.
A significant trend in data analysis over the past 20 years has been the efforts at promoting computational transparency and reproducibility. These efforts have had many benefits, including the wide dissemination of code and datasets that can be used for both verification and extension with new analyses. However, a question remains as to whether computational reproducibility is a useful indicator of the trustworthiness of a data analysis. While reproducible analyses can be checked more easily for problems or errors, a heavy burden is placed on others to similarly reproduce the time and resources to execute the analytic code. We argue that reproducibility, while useful as a minimum standard for trustworthiness, is not sufficient and that other formats for presenting and distributing data analyses should be considered. We borrow ideas from systems engineering and demonstrate some of these techniques through case studies.
Data analysis is increasingly being performed on a wider-scale than ever before. The widening of scale includes the number of people doing data analysis, as well as the application of analysis to many different problems and fields. A challenge of this phenomena is that many people now performing data analysis may not have formal "data" training. Additionally, the applications of data analysis now routinely having wide and important impacts to human lives (e.g., loan qualification, health treatment access, etc). That such analyses be trustworthy seems important, but also challenging to ensure. This talk explores what societal pressures (moral, reputational and institutional) currently exist to promote trustworthy data analyses and whether they are sufficient to pressure data analysts to perform analysis that is "trustworthy enough". The aim of surveying this landscape is to understand the current state of affairs and identify where there may not be enough societal pressures to induce analysts into making their analyses trustworthy. Identifying the gaps will be an important starting point for the data analyst community to improve community standards and norms around trust.
For decades, Canadian Universities have proclaimed to be vestiges of acceptance where all can be successful. Specifically, in physical sciences objectivity is revered; upheld as the great equalizer leading us to innovation and is foundational to discovery. However, objectivity has perhaps prevented open commentary about the human aspects of science. Bias is inherent in us all, and privilege has shaped who we deem worthy to hold the title of scientist. This talk will briefly review:
1) historical discrimination and the ways systemic biases impact access to and success in Academia
2) the barriers to success which are present in higher education and challenge us to consider the impact on current and future scientists
3) considerations for ethical sociodemographic data collection and complementary qualitative data