Survival Analysis: Non-parametric, Machine Learning, and Measurement Error Latest Research

Milind Phadnis Chair
University of Kansas Medical Center
 
Wednesday, Aug 6: 2:00 PM - 3:50 PM
4200 
Contributed Papers 
Music City Center 
Room: CC-104A 

Main Sponsor

Biometrics Section

Presentations

A Bayesian Varying-effect Scalar-on-function Survival Model with Measurement Error

Scalar-on-function regression (SoFR) has become a commonly used method for modeling the relationship between scalar outcomes and functional predictors, such as physical activity (PA) patterns from wearable devices. Recent extensions of SoFR to survival analysis also correct for measurement error in functional covariates. However, these approaches often do not simultaneously address the heterogeneity of the effect across individuals. More specifically, they assume that the effect of the functional predictor is constant across individuals, which may obscure variation driven by subject-specific characteristics. We develop a semi-parametric Bayesian SoFR accelerated failure time (AFT) model that corrects for measurement error within functional covariates and includes an instrumental variable that allows nonlinear relationships with the functional covariate. We also introduce a varying functional coefficient that depends on a scalar covariate through a flexible Gaussian process single-index structure. For inference, we compare traditional Markov chain Monte Carlo sampling with the integrated nested Laplace approximation (INLA) to highlight trade-offs between computational efficiency and flexibility.
 

Keywords

Bayesian

Scalar-on-function

Physical activity

Measurement error

Instrumental variables

Varying effect 

Co-Author

Roger S Zoh, Indiana University

First Author

Joseph Yang

Presenting Author

Joseph Yang

A joint model of survival outcome and covariates subject to measurement errors and detection limits

Biomarkers are measured longitudinally with measurement errors and detection limits, associating these variables with a survival outcome needs to take both into account. We propose a joint model using linear mixed effects models and Cox proportional hazard model. Parameters are found with Monte-Carlo Expectation Maximization method with Newton-Raphson steps. The model is compared with the ideal model of assuming measurement errors are known, the naïve model of ignoring the measurement errors and the two-step approach. Simulations show that the proposed model has the lowest bias in estimating the coefficient on the left-censored biomarkers comparing to the last two methods when the censoring proportion is high on the markers, and it has lower standard errors than the two-step approach on the parameter across the censoring proportions. These three methods were applied to a real-world data from longitudinal study of oncogenic HPV infections among HIV positive women to find the associations with the plasma HIV viral load and CD4+ cell count. The topic will be of interest to researchers working at the intersection of statistics and biomedical sciences. 

Keywords

longitudinal data

measurement error

detection limit

Cox proportional hazard model

joint model

Monte Carlo Expectation-Maximization 

First Author

Xianhong Xie

Presenting Author

Xianhong Xie

A versatile and powerful framework to identify patient subgroups using dense random survival forests

Precision oncology aims to prescribe the optimal cancer treatment to the right patients, maximizing therapeutic benefits. However, identifying patient subgroups that may benefit more from experimental cancer treatments based on randomized clinical trials presents a significant analytical challenge. To address this, we introduce a novel unsupervised machine learning approach utilizing very dense random survival forests (up to 100,000 trees). This method is robust, interpretable, and effectively identifies responsive subgroups. Extensive simulations confirm its ability to detect heterogeneous patient responses and distinguish between datasets with and without heterogeneity, while maintaining a stringent Type I error rate of 1%. We further validate its performance using Phase III randomized clinical trial datasets, demonstrating significant patient heterogeneity in treatment response based on baseline characteristics. 

Keywords

subgroup analysis

survival analysis

machine learning

Phase III clinical trial

random forest

unsupervised learning 

Co-Author(s)

Qing Liu, Amgen Inc.
Xun Jiang, Amgen
Amy Xia, Amgen
Peng Wei, University of Texas, MD Anderson Cancer Center
Brian Hobbs, University of Texas

First Author

Xingyu Li, MD Anderson

Presenting Author

Xingyu Li, MD Anderson

Nonparametric Efficient Estimation of Dynamic Treatment Regimes with Competing Risks

Osteoporotic fractures, treated via bisphosphonates (BPs), present a major public health burden. However, prolonged BP use may increase risk of uncommon severe outcomes that are difficult to study due to rarity, late onset, and potential death before observation. BP use is often paused for several years to mitigate such risks, but long-term effects and ideal duration of breaks are unknown. Optimal treatment thus requires modeling risks under dynamic regimes, balancing BP use and holiday durations. Methods such as inverse probability weighting and marginal structural models are used to study causal effects of time-varying exposures with semi-competing risks but often rely on strong assumptions and may lack efficiency, which is crucial for rare outcomes. In this work, we construct nonparametric efficient estimators to assess cumulative incidence of such events under various treatment regimes, accounting for mortality as a competing event. We also analyze the asymptotic behavior of our estimators via empirical process theory. Our method allows us to leverage patient health records and claims data to provide straightforward inferential methods for rare outcomes. 

Keywords

dynamic treatment regimes

competing risks

causal inference

observational data analysis

nonparametric methods

efficient estimation 

Co-Author

Jared Huling, University of Minnesota

First Author

Nitya Shah, University of Minnesota

Presenting Author

Nitya Shah, University of Minnesota

Perturbation-Based Efficient Resampling Method for Variance Estimation in Survival Data Analysis

Accurate and easy-to-implement variance estimation is essential but often challenging in complex statistical settings. Traditional approaches, such as the nonparametric bootstrap, are computationally intensive as they typically require repeatedly solving estimating equations. Moreover, bootstrapped samples almost always contain ties, introducing additional complications. These challenges are further amplified when plug-in estimators are involved, where how variability is carried from one stage to the next is less explored, and failing to account for it properly could lead to an underestimation of the variance. To address these issues, we examine resampling and perturbation methods that retain the nonparametric flexibility of the traditional bootstrap while significantly reducing computational burden by eliminating the need for repeated equation solving. Through extensive simulation, we show that the perturbation method offers an efficient and reliable alternative for variance estimation in survival data, substantially lowering computational costs compared to the bootstrap while maintaining accuracy, making it a powerful tool for statistical inference in complex models. 

Keywords

Perturbation

Variance Estimation

Resampling

Survival Data

Bootstrap 

Co-Author(s)

Sy Han Chiou
Chuan-Fa Tang

First Author

Weixi Zhu

Presenting Author

Weixi Zhu

Pseudo-Observations for Bivariate Survival Data

The pseudo-observations approach has been gaining popularity as a method to estimate covariate effects on censored survival data. It is used regularly to estimate covariate effects on quantities such as survival probabilities, restricted mean life, and cumulative incidence. In this work, we propose to generalize the pseudo-observations approach to situations where a bivariate failure-time variable is observed, subject to right censoring. The idea is to first estimate the joint survival function of both failure times and then use it to define the relevant pseudo-observations. Once the pseudo-observations are calculated, they are used as the response in a generalized linear model. We consider two common nonparametric estimators of the joint survival function: the estimator of Lin and Ying (1993) and the Dabrowska estimator (1988). For both estimators, we show that our bivariate pseudo-observations approach produces regression estimates that are consistent and asymptotically normal. Our proposed method enables estimation of covariate effects on the joint survival probability at a fixed number of bivariate time points. We demonstrate the method using simulations and real-world data. 

Keywords

Censoring

Generalized estimating equations;

Generalized linear models

Multi-variate survival analysis 

Co-Author(s)

Micha Mandel, The Hebrew University
Rebecca Betensky, NYU College of Global Public Health

First Author

Yael Travis-Lumer

Presenting Author

Yael Travis-Lumer

When distributed learning meets heterogeneity: choice of methods for federated survival analysis

Survival analysis is widely utilized for analyzing risk factors and predicting disease progression. In multi-center studies, it is beneficial to integrate data from multiple sites to enhance the power of analyses, such as Cox proportional hazards regression. Federated learning algorithms have been employed to integrate multi-site clinical data, especially when individual patient data (IPD) cannot be shared across sites. While heterogeneity can significantly impact the development of federated algorithms, the performance of commonly used federated learning methods under various heterogeneity scenarios has not been thoroughly evaluated. In this paper, we compare three distributed learning algorithms: the meta-estimator, the One-shot Distributed Algorithm for Cox regression (ODAC), and the heterogeneous version of ODAC (ODACH). These comparisons are conducted through both a simulation study and a real-world application within a research network. We offer recommendations for their use in survival analysis practice. 

Keywords

Survival analysis

Federated learning

Cox pregression

Heterogeneity

Privacy-preserving 

Co-Author(s)

Yudong Wang, University of Pennsylvania, Perelman School of Medicine
Yong Chen, University of Pennsylvania, Perelman School of Medicine

First Author

Chongliang Luo, Washington University in St Louis

Presenting Author

Chongliang Luo, Washington University in St Louis