Latest Techniques in Risk Prediction Modeling

Qingyan Xiang Chair
Vanderbilt University Medical Center
 
Thursday, Aug 7: 8:30 AM - 10:20 AM
4211 
Contributed Papers 
Music City Center 
Room: CC-202C 
Presenters in this session will demonstrate within a wide area of statistical methods and learning novel ways of handling and implementing risk prediction.

Main Sponsor

Biometrics Section

Presentations

Analysis of “Learn-As-you-GO” (LAGO) in Stepped Wedge Designs with Random Facility Effects

In implementation science, studies often demand substantial investments of time, money, and personnel but may still fail to detect significant treatment effects. This raises a critical challenge: how to allocate resources efficiently to achieve statistically significant results while minimizing study costs. This work introduces the Learn-As-you-GO (LAGO) design in the context of the widely used yet complex stepped wedge design (SWD). The LAGO approach adapts interventions in a later stage of a study based on data collected from earlier stages, aiming to enhance cost-efficiency. However, this adaptation creates dependencies across stages, clinics, and patients, which challenge classical statistical methods relying on the assumption of independent and identically distributed data. We rigorously demonstrate that classical statistical properties, such as consistency and asymptotic normality, are preserved when analyzing LAGO-generated data. Simulations demonstrate LAGO's advantages in balancing effects and costs over fixed-intervention designs. These findings highlight LAGO's potential to advance implementation science by enabling more impactful and resource-efficient studies. 

Keywords

Adaptive Design

Intervention Adaptation

Stepped Wedge Design

Cost-Efficiency

Asymptotic Normality

Consistency 

Co-Author(s)

Judith Lok, Boston University
Donna Spiegelman, Yale School of Public Health
Xin Zhou, Yale University

First Author

Jingyu Cui, Yale School of Public Health

Presenting Author

Jingyu Cui, Yale School of Public Health

Clarifying the Role of the Mantel-Haenszel Risk Difference Estimator in Randomized Clinical Trials

The Mantel-Haenszel (MH) risk difference estimator is widely used for binary outcomes in randomized clinical trials. This estimator computes a weighted average of stratum-specific risk differences and traditionally requires the stringent assumption of homogeneous risk difference across strata. In our study, we relax this assumption and demonstrate that the MH risk difference estimator consistently estimates the average treatment effect. Furthermore, we rigorously study its properties under two asymptotic frameworks: one characterized by a small number of large strata and the other by a large number of small strata. Additionally, a unified robust variance estimator that improves over the popular Greenland's and Sato's variance estimators is proposed, and we prove that it is applicable across both asymptotic scenarios. Our findings are validated through simulations and real data applications. 

Keywords

Average treatment effect

Covariate adjustment

Robust variance estimation

Stratified 2 × 2 table

Difference of proportion 

Co-Author(s)

Yuhan Qian, University of Washington
Jaehwan Yi, Pennsylvania State University
Jinqiu Wang
Yu Du, Eli Lilly and Company
Yanyao Yi, Eli Lilly and Company
Ting Ye, University of Washington

First Author

Xiaoyu Qiu

Presenting Author

Xiaoyu Qiu

Fast and robust invariant generalized linear models

Statistical integration of diverse data sources is an essential step in the building of generalizable prediction tools, especially in precision health. The invariant features model is a new paradigm for multi-source data integration which posits that a small number of covariates affect the outcome identically across all possible environments. Existing methods for estimating invariant effects suffer from immense computational costs or only offer good statistical performance under strict assumptions. In this work, we provide a general framework for estimation under the invariant features model that is computationally efficient and statistically flexible. We also provide a robust extension of our proposed method to protect against possibly corrupted or misspecified data sources. We demonstrate the excellent properties of our method via simulations, and use it to build a transferable kidney disease prediction model using electronic health records from the All of Us research program. 

Keywords

data integration

generalizability

precision health

electronic health records 

Co-Author(s)

Ndey Isatou Jobe, Harvard T.H. Chan School of Public Health
Rui Duan

First Author

Parker Knight

Presenting Author

Parker Knight

Leveraging External Information from a Different Outcome Model with the Current Study

Leveraging external information from related studies can improve prediction accuracy with insufficient data. However, conventional methods only consider incorporating information from the external data with the same outcome. In this paper, we develop an integration framework for the settings where the external and internal data are relevant but may be subject to different types of outcomes. The proposed framework utilizes the generic structure of certain models to bridge the different outcomes and introduces the statistics distance information to characterize the heterogeneity across different populations and outcomes. Illustrative examples discussed in this paper include the integration of continuous outcome data with binary outcome data and the integration of discrete survival outcome data with continuous survival outcome data. We evaluate the performance of the proposed method through comprehensive numerical simulations. We apply the proposed framework to multiple analyses of the acute kidney injury (AKI) study in populations who received immune checkpoint inhibitor (ICI) treatments. 

Keywords

data integration

outcome heterogeneity

population heterogeneity


Kullback-Leibler information 

Co-Author

Kevin (Zhi) He, University of Michigan

First Author

Di Wang, University of Michigan

Presenting Author

Di Wang, University of Michigan

Multi-Layer Backward Joint Model for Dynamic Prediction with Multivariate Longitudinal of Mixed Type

Dynamic prediction of time-to-event outcomes using longitudinal data is highly useful in clinical research and practice. A common strategy is the joint modeling of longitudinal and time-to-event data. The shared random effect model has been widely studied for this purpose. However, it can be computationally challenging when applied to problems with a large number of longitudinal predictor variables, particularly when mixed types of continuous and categorical variables are involved. Addressing these limitations, we introduce a novel multi-layer backward joint model (MBJM). The model structure consists of multiple data layers cohesively integrated through a series of conditional distributions that involve longitudinal and time-to-event data, where the time to the clinical event is the conditioning variable. This model can be estimated with standard statistical software with rapid and robust computation, regardless of the dimension of the longitudinal predictor variables. We provide both theoretical and empirical results to show that the MBJM outperforms the static prediction model that does not fully account for the longitudinal nature of the prediction. 

Keywords

Categorical data

Dynamic prediction

Multi-Layer Backward Joint model

Multivariate longitudinal data

Survival analysis 

Co-Author(s)

Zhe Yin, MD Anderson
Liang Li, University of Texas MD Anderson Cancer Center

First Author

Wenhao Li, Edwards Lifesciences

Presenting Author

Wenhao Li, Edwards Lifesciences

Predicting cognitive impairment using novel functional features of spatial proximity and circularity

The digital clock drawing test (dCDT) screens for cognitive impairment using a digital pen to track movements as participants draw a clock from memory. While many studies rely on summary statistics of dCDT features to predict cognitive outcomes, these approaches often involve subjective decisions such as feature selection and imputation. In this study, we introduce novel dCDT features, expressed as mathematical functions, to capture more granular aspects of the test. We compare the performance of these functions against traditional summary features, assessing their ability to offer deeper insights into cognition. These features account for the circularity of the clock, spatial proximity of drawing points, and pressure applied to the paper. When combined with established time-based features, functional features related to spatial proximity and circularity demonstrated predictive power comparable to commonly used features. Our findings highlight the potential of integrating functional features to detect subtle motions and behaviors in digital cognitive assessments, offering new tools that may enhance diagnostic accuracy and support early detection strategies. 

Keywords

dementia

digital clock drawing test

functional data analysis

predictive modeling

machine learning 

Co-Author(s)

Cody Karjadi, Boston University
Yorghos Tripodis, Boston University
Vijaya Kolachalama, Boston University
Kathryn Lunetta, Boston University School of Public Health
Serkalem Demissie, Boston University
Chunyu Liu, Boston University
Rhoda Au, Boston University
Shariq Mohammed, Boston University

First Author

Adlin Pinheiro

Presenting Author

Adlin Pinheiro

Targeted learning via probabilistic subpopulation matching

To get more accurate prediction results from a target study, transfer knowledge from similar source studies is proved to be useful. However, in many real-world biomedical applications, populations in different studies, e.g., clinical sites, can be heterogeneous, causing challenges in properly borrowing information towards the target study. If using study-level matching to identify similar source studies, samples from source studies that significantly differ from the target study will all be dropped at the study level, which can lead to substantial information loss. We consider a general situation where all studies are sampled from a super-population composed of distinct subpopulations, and propose a novel framework of targeted learning via subpopulation matching. We first fit a finite mixture model jointly across all studies to get subject-wise probabilistic subpopulation information, and then transfer knowledge from source studies to the target study within each identified subpopulation. By measuring similarities between subpopulations, our method effectively decomposes between-study heterogeneity and allows knowledge transfer from all source studies without dropping any samples. 

Keywords

Finite mixture model

Generalized linear regression

Subpopulation structure

Transfer learning 

Co-Author(s)

Jie Hu, University of Pennsylvania
Naimin Jing, Merck & Co.
Yang Ning, Cornell University
Cheng Yong Tang, Temple University
Runze Li, Penn State University
Yong Chen, University of Pennsylvania, Perelman School of Medicine

First Author

Xiaokang Liu, University of Missouri

Presenting Author

Xiaokang Liu, University of Missouri