The importance of hospitalization information for mortality prediction

Conference: International Conference on Health Policy Statistics 2023
01/11/2023: 11:45 AM - 12:00 PM MST
Contributed 

Description

Despite the rapid advancement of machine learning algorithms, pessimism has recently been expressed on their ability to accurately predict patients who are going to die from those who aren't in the long run. The common methods for improving model prediction in the field of machine learning rely heavily on the maneuverability of complex model architecture to forcefully connect input predictors and outcomes, leaving little space for exploiting realistic predictor-outcome relationships. To avoid making machine learning entirely black box, we aim to investigate the possibility of incorporating realistic assumptions into the training of machine learning models for the purpose of enhancing mortality prediction.

While death in the long run is difficult to predict, we hypothesized that for recently hospitalized individuals, imminent death is much more predictable. We further hypothesized, that if the patient survives the acute time-period, the risk largely wanes. We formed a statistical modeling procedure that takes advantage of the short-term risk information to enhance prediction of an outcome event with a longer time-horizon. The general approach is to decompose the time-horizon of the prediction into intervals that take advantage of the short-term predictive power of acute risk factors such as current or recent hospitalization versus direct modeling of the longer-term outcome. We investigate the efficacy of this "predictive-power banking approach" using logistic regression and Extreme Gradient Boosting (XGB). For example, if our goal is to estimate 6 month mortality, in a simple implementation of our approach we might first estimate 1 month mortality and conditional 6 month mortality given survival to 1 month. Unconditional 6 month mortality is then determined using the law of total probability. By decomposing the follow-up time scale into components and estimating separate models on each component, we allow the prediction to benefit from predictors with time-varying coefficients for both logistic regression, XGB, and any other predictive algorithm. The methodology generalizes to allowing a general number of break-points.

The performance of our approach was evaluated on a Medicare health insurance claims dataset for a cohort of patients diagnosed with chronic obstructive pulmonary disease (COPD), chronic kidney disease (CKD) or congestive heart failure (CHF), that allowed us to measure hospitalization and mortality over time as well as patient level baseline characteristics. Consistent with an elevated risk period of 30-days following surgery, we allowed the effect of predictors to change after 30-days of follow-up. We used area under the ROC curve (AUC) and Youden's Index to evaluate prediction accuracy for both our two-step approach and the direct approach. The effects in the estimated models were compared across components to test our hypothesis that those containing information about a recent hospitalization were predictive but primarily only over the short-term. Multifold cross validation was used to demonstrate the consistency of the results. Our general finding is that providing structure to AI or machine learning (ML) algorithms may enhance their overall predictive accuracy as it allows them to focus on the aspects of the prediction that they are best at learning.

Mortality prediction is a challenging yet important topic for the evaluation of health policies. Accurate and reliable estimation of risk for all patients will help align clinical decision making and healthcare resources with patient need. Therefore, this project fits perfectly to the theme of ICHPS as our work improves upon mortality prediction approaches used in statistics and ML with a structural approach informed by medical knowledge. Our results provide novel insights in terms of deciphering health data to inform health policies.

Keywords

Predictive modeling of mortality

Time-varying predictor coefficients over follow-up

Multi-part modeling

Continuation-ratio logistic regression

Machine learning

Medicare data 

Presenting Author

Bo Qin

First Author

Bo Qin

CoAuthor(s)

Curtis Petersen, Dartmouth College
Jonathan Skinner, Dartmouth College
James O'Malley, Dartmouth University, Geisel School of Medicine