Monday, Aug 4: 2:00 PM - 3:50 PM
4073
Contributed Papers
Music City Center
Room: CC-212
This session will highlight latest advancements in personalized precision medicine research as applied to EHR data sources, SMART data, or multi omics.
Main Sponsor
Biometrics Section
Presentations
The cyclical and heterogeneous nature of many substance use disorders highlights the need to adapt
the type and/or the dose of treatment to accommodate the specific and changing needs of individuals. The Adaptive Treatment for Alcohol and Cocaine Dependence study (ENGAGE) is a sequential multiple assignment randomized trial (SMART) that aimed to construct dynamic treatment regimes (DTRs) to improve patients' engagement in therapy. However, the high rate of noncompliance and lack of analytic tools to account for noncompliance has impeded researchers from using the data to construct individually tailored DTRs. We overcome this issue by defining our target parameter as the mean outcome under different DTRs for given potential compliance strata and propose a marginal structural model with principal stratification to estimate this quantity. We model the latent principal strata using a Bayesian semiparametric approach. An important feature of our work is that we consider partial rather than binary compliance strata which is more relevant in longitudinal studies. We assess the performance of our method through simulation and application to the ENGAGE study.
Keywords
Dynamic treatment regime
Non-parametric Bayes
Partial compliance
Principal stratification
Marginal structural models
A central objective of precision medicine is learning optimal dynamic treatment regimes (DTRs) from data. Classification-based methods, like outcome weighted learning (OWL) for single-stage and backward OWL (BOWL) for multi-stage problems, leverage machine learning to directly learn optimal DTRs. However, these methods lack a natural way to quantify uncertainty and only use the data from patients whose actual treatment paths align with the optimal decision rule. In this paper, we extend Bayesian OWL – a Bayesian reformulation of OWL – to the multi-stage setting. We call this method backward Bayesian outcome weighted learning (BBOWL). Like BOWL, our method directly learns an optimal DTR via backward induction, and unlike existing methods, our approach propagates uncertainty backward through the DTR-learning process and provides uncertainty quantification of individualized treatment recommendations. Furthermore, our approach leverages the full information contained in the observed data. We present theoretical guarantees of BBOWL and verify its performance via both simulation studies and case study data.
Keywords
Precision medicine
dynamic treatment regimes
Bayesian statistics
Leveraging real-world electronic health records (EHR) for precision medicine requires robust modeling of patient heterogeneity and treatment effects while mitigating biases inherent in observational data. We introduce a novel framework for learning rich, contextualized, and debiased EHR embeddings that enable individualized counterfactual outcome prediction and precise estimation of individualized treatment effects (ITE). Our approach integrates adversarial debiasing and negative control strategies to correct for confounding while preserving patient-specific contextual information. We demonstrate its utility in optimizing the use of GLP-1 receptor agonists (GLP-1RAs), identifying patients who would benefit but are currently untreated, and detecting those receiving treatment despite being suboptimal candidates for heart failure and mental health outcomes. This method provides a robust foundation for precision medicine, ensuring treatment decisions are data-driven, patient-specific, and causally robust.
Keywords
Counterfactual Outcome Prediction
Negative control outcomes
Precision Medicine
Real-World Evidence
electronic health records
Individualized Treatment Effect (ITE)
During chronic disease treatments (e.g., cancer, sepsis, diabetes), patients often receive repeated treatments. Our aim is to learn the best sequence of treatments, also called the dynamic treatment regimes, using already available patient data. When only two treatment options are available, DTR learning reduces to sequential weighted binary classification. In general, when the number of treatments are greater than two, DTR learning reduces to a sequence of weighted multi-class classification problems. In this paper, we characterize a class of smooth surrogate loss for these multi-class classification problems, and show that our surrogate loss is Fisher consistent for arbitrary number of treatment options per stage. We show that the proposed surrogate loss enjoys some interesting properties such as Fisher consistency among the class of linear policies as well. However, the surrogates being non-convex, DTR learning transforms into a non-convex but smooth optimization problem. We develop an appropriate algorithm for solving the non-convex optimization problem, and provide guarantees on the convergence to global optimum under some curvature-type conditions.
Keywords
Dynamic Treatment Regimes
Sequential Decision-Making and Policy Learning
Surrogate Losses
Non-Convex Optimization
Fisher Consistency
Weighted Multi-class Classification
Developing tools for estimating heterogeneous treatment effects (HTE) has been an area of active research in recent years. While these tools have proven to be useful in many contexts, a concern when deploying such methods is the degree to which incorporating HTE into a prediction model provides an advantage over methods which do not allow for treatment effect variation. To address this, we propose a procedure which evaluates the extent to which an HTE model provides a predictive advantage by targeting the gain in predictive performance from using a flexible predictive model incorporating HTE versus a similar alternative model which that is constrained to not allow variation in treatment effect. By drawing upon recent work on nested cross-validation techniques for prediction error inference, we generate confidence intervals for this measure of gain in predictive performance which allows one to calculate the level at which one is confident
of a substantial HTE-modeling gain in prediction - a quantity which we refer to as the h-value. Our procedure is generic and can be used to assess the benefit of modeling HTE for any method that incorporates treatment effect variation.
Keywords
interaction
model comparison
precision medicine
resampling
Estimating time-varying treatment effects is essential for guiding clinical decisions, particularly in chronic disease management. However, applying existing causal inference methods to observational data, such as electronic health records (EHR), is challenging due to irregular patient visit patterns. A common approach uses multiple imputation to fill in missing data before applying causal methods, but this increases modeling complexity and may be inefficient. We proposed a sequential analysis using a Bayesian additive regression trees (BART) model that directly accommodates irregular visit patterns, allowing the visit mechanism to depend on unobserved data. Our method also handles treatment heterogeneity, enabling more accurate effect estimation for individualized treatment decisions. Through simulation studies, we show that our approach significantly improves estimation compared to standard two-step practices relying on multiple imputation. We illustrate its use with EHR data from a juvenile idiopathic arthritis study.
Keywords
Time-varying treatment effects
Irregular longitudinal data
Multiple imputation
Bayesian additive regression trees
Joint modeling of longitudinal data and survival data has gained great attention over the last few decades. We study joint analysis of skewed longitudinal data and discrete failure time data, and conduct grouped variable selection in this framework. A joint model is proposed with a shared frailty to characterize the dependence between the two types of responses, where the longitudinal response is modeled with a log-normal mixed-effects submodel and the survival time is modeled with a complementary log-log submodel. Penalized likelihood-based approaches are developed to simultaneously select significant covariates and estimate their effects on the two types of responses. A Monte Carlo EM (MCEM) method is used for the implantation. Our simulation study shows that these methods perform well in both variable selection and parameter estimation. A real-life data application to the LIFE study is provided as an illustration.
Keywords
joint modeling
Skewed longitudinal data
discrete failure time data
grouped variable selection
Monte Carlo EM