Tuesday, Aug 6: 8:30 AM - 10:20 AM
01419
Invited Paper Session
Oregon Convention Center
Room: CC-C123
Applied
Yes
Main Sponsor
Stats. Partnerships Among Academe Indust. & Govt. Committee
Co Sponsors
Committee on Applied Statisticians
Presentations
Statistics Without Borders (SWB) is a global not-for-profit organization that leads pro bono projects in partnership with international not-for-profit and non-governmental organizations. To date, SWB has completed over 250 projects, with recent projects supporting organizations in Afghanistan, Canada, Haiti, India, Nigeria, the United States, and Zimbabwe. One challenge of these projects is to build trust with our client organizations and the communities we aim to help. We'll walk through two pro bono projects that SWB has completed, emphasizing how we built trust with these clients and communities. We'll also cover systems that SWB has put into place to sustainably build trust for every project. Our goal is that you will come away with specific ideas of how you can build trust in the context of international development.
There is perhaps no bigger issue facing our field right now than that of misinformation – and the advent of tools like ChatGPT has increased this risk. A central reason for this is bias from large language models and how that can lead to misleading and/or incorrect information disproportionately impacting certain communities. NORC is developing a model of online information to better understand how to detect and mitigate bias in such models. The data are focused specifically on the topic of COVID vaccine misinformation, which the study team chose because of the strong historical record of misinformation across social media platforms and issues related to health equity. NORC collected more than 10 terabytes of data from across Twitter and Instagram from 2020 to 2023. The study team hand-coded a training sample, building upon several different open-source misinformation indexes, and then trained and deployed the model. This presentation will share learnings from this process; share the model developed; and finally, describe the learnings gleaned from the process related to bias in LLM development.
The quality of output from Generative AI models is limited by the quality of data it uses to train itself. Input data which is inaccurate, outdated, or incomplete can lead to bad output or hallucinations, where the model confidently asserts that a falsehood is real. We discuss challenges in the estimation of Generative AI models which can cause misinformation including inheriting biases present in the training data and producing outputs that are plausible but fundamentally incorrect or nonsensical. We also discuss mitigation strategies such as the curation of training data, meticulous algorithm design, and continuous monitoring to minimize biases. Additionally, we present an illustrative example on establishing mechanisms for rigorous model evaluation and quality control.
In a six-year Phase III non-inferiority trial, we evaluated dietary interventions for neutropenic patients: a liberalized diet versus a standard one. Our primary goal was to compare the incidence of major infections. Two interim analyses (IAs) were prespecified to guide the continuation or termination of the study. The 1st IA in 2021 allowed the study to continue with some reservations due to infection rates closely approaching the prespecified margin of 10% difference. Despite the p-value from the IA, which suggested the continuation of patient enrollment based on the primary endpoint, we exercised caution. In a proactive move, our statistical team proposed an earlier second IA than the original plan. The results from the newly added IA in Spring 2023 indicated that the infection rate exceeded the threshold, leading to an immediate halt in patient enrollment and trial termination in June 2023. This presentation underscores the vital role of statistical leadership in multidisciplinary research involving multiple stakeholders, showcasing how statisticians prioritized patient care, made data-driven decisions, provided clear recommendations, and influenced clinical trial outcomes.
Over the last 15+ years, the University of North Carolina Department of Biostatistics and Merck's Department of Biostatistics and Research Decision Science (BARDS) have collaborated closely on a number of important statistical research projects. These projects have resulted in significant methodological contributions to the statistical field with applications to real-world problems in pharmaceutical R&D. These cover a wide scope of research topics: from frequentists to Bayesian approaches, from trial level design to aggregate/meta analyses, and from traditional statistical modelling to machine learning methodologies. The areas of pharmaceutical applications include, just naming a few examples, safety signaling and evaluation, the efficacy evaluation of innovative medicines (eg lipid lowering therapies) and post-marketing assessment of rare but serious safety events in the Rotavirus Vaccine. Case studies will be highlighted during the presentation. This collaboration has won the 2023 SPAIG Award.