Thursday, Aug 7: 8:30 AM - 10:20 AM
4211
Contributed Papers
Music City Center
Room: CC-202C
Presenters in this session will demonstrate within a wide area of statistical methods and learning novel ways of handling and implementing risk prediction.
Main Sponsor
Biometrics Section
Presentations
In implementation science, studies often demand substantial investments of time, money, and personnel but may still fail to detect significant treatment effects. This raises a critical challenge: how to allocate resources efficiently to achieve statistically significant results while minimizing study costs. This work introduces the Learn-As-you-GO (LAGO) design in the context of the widely used yet complex stepped wedge design (SWD). The LAGO approach adapts interventions in a later stage of a study based on data collected from earlier stages, aiming to enhance cost-efficiency. However, this adaptation creates dependencies across stages, clinics, and patients, which challenge classical statistical methods relying on the assumption of independent and identically distributed data. We rigorously demonstrate that classical statistical properties, such as consistency and asymptotic normality, are preserved when analyzing LAGO-generated data. Simulations demonstrate LAGO's advantages in balancing effects and costs over fixed-intervention designs. These findings highlight LAGO's potential to advance implementation science by enabling more impactful and resource-efficient studies.
Keywords
Adaptive Design
Intervention Adaptation
Stepped Wedge Design
Cost-Efficiency
Asymptotic Normality
Consistency
The Mantel-Haenszel (MH) risk difference estimator is widely used for binary outcomes in randomized clinical trials. This estimator computes a weighted average of stratum-specific risk differences and traditionally requires the stringent assumption of homogeneous risk difference across strata. In our study, we relax this assumption and demonstrate that the MH risk difference estimator consistently estimates the average treatment effect. Furthermore, we rigorously study its properties under two asymptotic frameworks: one characterized by a small number of large strata and the other by a large number of small strata. Additionally, a unified robust variance estimator that improves over the popular Greenland's and Sato's variance estimators is proposed, and we prove that it is applicable across both asymptotic scenarios. Our findings are validated through simulations and real data applications.
Keywords
Average treatment effect
Covariate adjustment
Robust variance estimation
Stratified 2 × 2 table
Difference of proportion
Statistical integration of diverse data sources is an essential step in the building of generalizable prediction tools, especially in precision health. The invariant features model is a new paradigm for multi-source data integration which posits that a small number of covariates affect the outcome identically across all possible environments. Existing methods for estimating invariant effects suffer from immense computational costs or only offer good statistical performance under strict assumptions. In this work, we provide a general framework for estimation under the invariant features model that is computationally efficient and statistically flexible. We also provide a robust extension of our proposed method to protect against possibly corrupted or misspecified data sources. We demonstrate the excellent properties of our method via simulations, and use it to build a transferable kidney disease prediction model using electronic health records from the All of Us research program.
Keywords
data integration
generalizability
precision health
electronic health records
Leveraging external information from related studies can improve prediction accuracy with insufficient data. However, conventional methods only consider incorporating information from the external data with the same outcome. In this paper, we develop an integration framework for the settings where the external and internal data are relevant but may be subject to different types of outcomes. The proposed framework utilizes the generic structure of certain models to bridge the different outcomes and introduces the statistics distance information to characterize the heterogeneity across different populations and outcomes. Illustrative examples discussed in this paper include the integration of continuous outcome data with binary outcome data and the integration of discrete survival outcome data with continuous survival outcome data. We evaluate the performance of the proposed method through comprehensive numerical simulations. We apply the proposed framework to multiple analyses of the acute kidney injury (AKI) study in populations who received immune checkpoint inhibitor (ICI) treatments.
Keywords
data integration
outcome heterogeneity
population heterogeneity
Kullback-Leibler information
Co-Author
Kevin (Zhi) He, University of Michigan
First Author
Di Wang, University of Michigan
Presenting Author
Di Wang, University of Michigan
Dynamic prediction of time-to-event outcomes using longitudinal data is highly useful in clinical research and practice. A common strategy is the joint modeling of longitudinal and time-to-event data. The shared random effect model has been widely studied for this purpose. However, it can be computationally challenging when applied to problems with a large number of longitudinal predictor variables, particularly when mixed types of continuous and categorical variables are involved. Addressing these limitations, we introduce a novel multi-layer backward joint model (MBJM). The model structure consists of multiple data layers cohesively integrated through a series of conditional distributions that involve longitudinal and time-to-event data, where the time to the clinical event is the conditioning variable. This model can be estimated with standard statistical software with rapid and robust computation, regardless of the dimension of the longitudinal predictor variables. We provide both theoretical and empirical results to show that the MBJM outperforms the static prediction model that does not fully account for the longitudinal nature of the prediction.
Keywords
Categorical data
Dynamic prediction
Multi-Layer Backward Joint model
Multivariate longitudinal data
Survival analysis
Co-Author(s)
Zhe Yin, MD Anderson
Liang Li, University of Texas MD Anderson Cancer Center
First Author
Wenhao Li, Edwards Lifesciences
Presenting Author
Wenhao Li, Edwards Lifesciences
The digital clock drawing test (dCDT) screens for cognitive impairment using a digital pen to track movements as participants draw a clock from memory. While many studies rely on summary statistics of dCDT features to predict cognitive outcomes, these approaches often involve subjective decisions such as feature selection and imputation. In this study, we introduce novel dCDT features, expressed as mathematical functions, to capture more granular aspects of the test. We compare the performance of these functions against traditional summary features, assessing their ability to offer deeper insights into cognition. These features account for the circularity of the clock, spatial proximity of drawing points, and pressure applied to the paper. When combined with established time-based features, functional features related to spatial proximity and circularity demonstrated predictive power comparable to commonly used features. Our findings highlight the potential of integrating functional features to detect subtle motions and behaviors in digital cognitive assessments, offering new tools that may enhance diagnostic accuracy and support early detection strategies.
Keywords
dementia
digital clock drawing test
functional data analysis
predictive modeling
machine learning
To get more accurate prediction results from a target study, transfer knowledge from similar source studies is proved to be useful. However, in many real-world biomedical applications, populations in different studies, e.g., clinical sites, can be heterogeneous, causing challenges in properly borrowing information towards the target study. If using study-level matching to identify similar source studies, samples from source studies that significantly differ from the target study will all be dropped at the study level, which can lead to substantial information loss. We consider a general situation where all studies are sampled from a super-population composed of distinct subpopulations, and propose a novel framework of targeted learning via subpopulation matching. We first fit a finite mixture model jointly across all studies to get subject-wise probabilistic subpopulation information, and then transfer knowledge from source studies to the target study within each identified subpopulation. By measuring similarities between subpopulations, our method effectively decomposes between-study heterogeneity and allows knowledge transfer from all source studies without dropping any samples.
Keywords
Finite mixture model
Generalized linear regression
Subpopulation structure
Transfer learning