Print Close

025 - Combining structured EHR data and clinical notes information with deep learning pipeline for predicting 30-day readmission in COPD patients

Conference: International Conference on Health Policy Statistics 2023

01/10/2023: 7:30 PM - 8:30 PM MST
Posters

Description

Chronic obstructive pulmonary disease (COPD) affects hundreds of millions of people in the world. Current evidence suggests that using conventional models (logistic regression with structured data) for predicting COPD readmissions have moderate performance. Increased adoption of EHR systems from health providers over the last years has resulted in the accumulation of valuable information which can be used to guide health care policy. Electronic Health Records (EHRs) are electronic records of patients' health information, including structured data stored in tabular form such as laboratory test results and demographics, and unstructured data in the form of clinical notes and reports. Clinical research investigators commonly analyze structured EHR data to gain the insight necessary to inform medical professionals and guide public health policymakers. However, a wealth of potentially useful information about patients' clinical history, stored in the form of free-text clinical notes, remains underutilize. Our objective is to use NLP to harness the additional information contained in clinical notes to improve prediction of COPD 30-day readmission. Our sample included 1670 patients at least 40 years old, with an inpatient visit at our institution from 2010 to 2019 for any reason with a diagnosis for COPD. Patient's age, gender, race, primary payer, length of stay, discharge disposition, and comorbidities were the covariates which formed the structured data while physician's discharged notes were processed with NLP. A logistic regression model and a neural network model for classification produced a ROC AUC of 58% and 59% respectively. A Bidirectional Encoder Representations from Transformers (BERT) NLP framework model was training on discharge notes using a ratio of 50:30:20 for train, validation and test datasets. The BERT model resulted in a AUC of 59%. Finally, a neural network for the structured data and a BERT model for the unstructured data were nested within a multimodal neural network. The output of the encoding layers of the two sub-models were concatenated before being forwarded in the final layer of the model for classification. The multimodal model produced an AUC of 63%. Our results suggests that additional improvement in prediction accuracy can likely be gained by utilizing patients' health information stored in both structured and unstructured forms.

Keywords

EHR data

Natural language processing

BERT

Deep learning

COPD

readmission

Presenting Author

Ioannis Malagaris

First Author

Ioannis Malagaris

CoAuthor(s)

Efstathia Polychronopoulou, UTMB
Yong-Fang Kuo, University of Texas Medical Branch
Duarte Duarte, University of Texas Medical Branch

Target Audience

Beginner

Tracks

Knowledge

International Conference on Health Policy Statistics 2023