Innovations at the Interface of Statistics, AI, and Real-World Evidence

Amita Manatunga Chair
Emory University
 
Anru Zhang Discussant
Duke University
 
Amita Manatunga Organizer
Emory University
 
Monday, Aug 4: 2:00 PM - 3:50 PM
0424 
Invited Paper Session 
Music City Center 
Room: CC-104A 

Applied

Yes

Main Sponsor

Committee of Presidents of Statistical Societies

Co Sponsors

ENAR

Presentations

Neural networks for spatially correlated data

Traditionally geospatial analysis has relied on statistical models that explicitly model spatial
correlations in the data. Recently, machine learning algorithms, such as neural networks and random
forests, are increasingly used in geospatial analysis. However, most machine learning algorithms do
not possess the functionality to directly encode spatial correlations. There is limited understanding of
the consequences of ignoring spatial correlations in machine learning algorithms applied to geospatial
data, despite this practice becoming increasingly common. We show empirically and theoretically that
ignoring spatial correlations reduces accuracy of machine learning algorithms for geospatial data.
We then propose well-principled machine learning algorithms for geospatial data that explicitly model
the spatial correlation as in traditional geostatistics. The basic principle is guided by how ordinary least
squares (OLS) extends to generalized least squares (GLS) for linear models to explicitly account for data
covariance. We demonstrate how the same extensions can be done for random forests and neural
networks, presenting the RF-GLS and NN-GLS algorithms. We provide extensive theoretical and
empirical support for the methods and show how they fare better than naïve or brute-force
approaches to use machine learning algorithms for spatially correlated data. We present the software
packages RandomForestsGLS and geospaNN implementing these methods. 

Keywords

Neural networks

Geospatial data

Machine learning

Random forests

Gaussian processes

Spatial statistics 

Speaker

Abhirup Datta, Johns Hopkins University

Integrative Approaches to Treatment Evaluation Leveraging Randomized Clinical Trials and Real-World Data

The 21st Century Cures Act, enacted in 2016, empowers the FDA to accelerate the development of new treatments by utilizing real-world data (RWD) and evidence. As a result, parallel randomized clinical trials (RCTs) and RWD are becoming increasingly available for evaluating treatment outcomes. Integrating heterogeneous data sources presents a unique opportunity to address clinical questions that cannot be answered by any single data source alone. This talk will explore various objectives and methodologies for conducting integrative analyses of data from RCTs and RWD. By combining the strengths of both RCTs and RWD, researchers can improve the generalizability of RCT findings using the broader representativeness of RWD, increase the efficiency and statistical power of treatment effect evaluations by incorporating comparable RWD, and assess long-term safety and efficacy by utilizing extended real-world follow-up data. Specifically, we will discuss newly developed strategies to mitigate biases and optimize treatment evaluation in hybrid clinical trials with external controls, including approaches such as test-then-pool, selective borrowing, and conformal prediction. 

Keywords

Real-world data

Real-world evidence

Hybrid trial designs 

Speaker

Shu Yang, North Carolina State University, Department of Statistics

Advancing Evidence for Opioid Use Disorder Treatments through Real-World Data and Novel Statistical Methods

The opioid epidemic remains a major public health crisis. Although evidence-based treatments for opioid use disorder (OUD) exist, most people with OUD do not receive treatment. Pragmatic trial designs have therefore been proposed to evaluate interventions designed to increase OUD treatment within entire clinics or health systems by leveraging health records (EHR) and other real-world data sources. In this talk, we present case studies that illustrate key challenges of using real-world data for evaluating intervention effects, including post-randomization selection bias that arises due to the intervention impacting diagnosis of OUD in the EHR, and observational outcome assessment processes, in which follow-up times from EHRs are irregularly spaced and may be intervention or outcome dependent. We clarify which estimands are being estimated in these settings, present simulation studies to evaluate the performance of methods addressing these challenges, and highlight novel statistical methods that have been developed and which are being implemented in these case studies to provide robust evidence on intervention effects to improve outcomes of people with OUD. 

Keywords

pragmatic trials

opioid use disorder

real-world data

electronic health records 

Speaker

Jennifer Bobb, Kaiser Permanente Washington