A Joint Modeling Approach for Radiogenomic Data Integration to Enhance Clinical Outcome Prediction

Tiantian Zeng Co-Author
Merck & Co., Inc.
 
Md Selim Co-Author
University of Kentucky
 
Jie Zhang Co-Author
University of Kentucky
 
Arnold Stromberg Co-Author
University of Kentucky
 
Jin Chen Co-Author
University of Alabama
 
Chi Wang Speaker
University of Kentucky
 
Thursday, Aug 7: 8:35 AM - 8:55 AM
Topic-Contributed Paper Session 
Music City Center 
Radiogenomics, an emerging field that integrates radiological imaging, genomics, and clinical data, holds the potential to enhance the accuracy of models for predicting patient outcomes through a multi-modal approach. However, the challenge lies in selecting a manageable number of informative features from the vast array of available features, especially given the complex intrinsic group structures, e.g. biological pathways, and limited availability of datasets that contain both genomic and imaging data. To address these challenges, we propose a joint modeling approach that integrates imaging and genomic data to improve the prediction of clinical outcomes. Specifically, we jointly consider two models, where Model 1 regresses imaging features on genomic features, and Model 2 regresses patient's clinical outcome (either continuous or time-to-event) on genomic features. A sparse group lasso method is used to select informative features while accounting for intrinsic group structures. To enhance the likelihood of selecting shared features, for each penalty term of one model, we introduce a weight based on the model coefficients of the other model to increase the selection chance of features selected by the other model. This weighting mechanism enables the integration of information between the two models to strengthen feature selection. An accelerated generalized coordinate descent algorithm is proposed to obtain model parameter estimates. Our joint model allows the use of two separate datasets to fit the two models, where the dataset for Model 2 does not necessarily contain imaging data. This flexibility enables the use of large-scale genomic datasets, even when corresponding imaging data is unavailable, thereby increasing statistical power. Simulation studies indicate that our method outperforms existing methods in the literature. The application of our method is demonstrated through real data analysis.