CS029 - Contributed: Methods and applications in risk prediction 

Conference: International Conference on Health Policy Statistics 2023
01/11/2023: 10:30 AM - 12:15 PM MST
Contributed 

Chair

Akriti Mishra, MSKCC

Presentations

Risks of advancing to metastatic breast cancer including and excluding competing risks: imnterpretation as a policiy versus clinical prognosis measure

Background: To overcome cancer registries' lack of recurrence data, a modeling method was developed to estimate recurrence-free survival both excluding (net survival) and including (crude survival) competing risks. This study aims to report recurrence-free survival for breast cancer (BC) patients diagnosed with early-stage disease using both methods and discuss the suitability of each method/measure as a health policy versus clinical decision measure. Methods: Both methods are based on an illness–death process coupled with a mixture cure model for relative survival. The risk of recurrence is inferred from the estimated survival among the non-cured fraction and survival for patients initially diagnosed with metastatic breast cancer and published data on survival after recurrence. Overall survival from US life tables is used to estimate recurrence-free survival under competing risks. We apply the method to relative survival by stage at diagnosis for women diagnosed with breast cancer between 2000-2018 in the Surveillance, Epidemiology and End Results registries. We compare 5-year recurrence free estimates in the absence and presence of competing risks by age and stage at diagnosis. Results: Recurrence-free survival obtained under the competing risk framework (crude) and assuming women can only die of their cancer (net) were similar for younger women especially at 5 years from diagnosis. However, including risks of dying of other causes produces lower estimates of recurrence-free among older women and longer time since diagnosis. For example, the 5-year recurrence-free survival for women diagnosed with stage II breast cancer at ages 15-59, 60-74 and 75-84 were respectively 87% (net) vs. 86% (crude) (15-59), 90% (net) vs. 86% (crude) (60-74), and 87% (net) vs. 71% (crude) (70-84). The same measure for 10-year recurrence-free survival were 83% (net) and 81% (crude) (15-59), 84% (net) vs. 76% (crude) (60-74), and 76% (net) vs. 45% (crude) (70-84). In the net framework the conditional probabilities of being recurrence-free in the next 5 years given alive 5 years from diagnosis did not vary much by age or stage. Including the risks of dying of other causes, the estimates of the percent of women recurrence-free is very similar to the net estimates for women diagnosed at younger ages 15-59 years and differ considerably for older women. Discussion: These recurrence free survival measures are especially useful for researchers, policy makers, clinicians and patients. However, by including versus excluding risks of dying of competing causes they can answer different questions. Net recurrence-free survival are most appropriate to represent trends, comparisons between different groups of cancer patients and the impact of cancer biology, as changes in competing causes of death could obscure comparisons. As such, net survival best suited to answer questions related to health policy, research, and biology. On the other hand, competing risk survival better describes an individual's chance of survival because it accounts for both the chance of dying from cancer and from competing causes and are most valuable in predictive tools, clinical decision making, and precision medicine. For example, older patients with coexisting comorbidity may have a higher probability of dying from competing causes than of dying from their cancer; in fact, the chance of dying from competing causes may preclude the benefit of cancer treatment. While they may be extremely valuable for a physician, they may provide a more pessimistic prognosis picture for a cancer patient by including their risk of dying of other causes. More research is needed on the communication of survival measures to cancer patients. 

Presenting Author

Angela Mariotto, National Cancer Institute

First Author

Angela Mariotto, National Cancer Institute

An intersectional framework for assessing counterfactual fairness in risk prediction

Along with the increasing availability of health data has come the rise of data-driven models to guide health policy by predicting patient outcomes. Such risk prediction models are used to assess patients' likelihood of certain adverse occurrences and thereby inform interventions. These models have the potential to harness health data to benefit both patients and health care providers. However, risk prediction models also have the potential to entrench or exacerbate health inequities.

Our work proposes a set of statistical tools for assessing the fairness of risk prediction models in a manner relevant to health policy. To our knowledge, our work is the first to develop tools within the counterfactual fairness framework while accounting for multiple, intersecting protected characteristics. Risk prediction models are widely used to guide patient care, and policy decisions such as recent efforts to reduce hospital readmissions have resulted in even wider implementation of the models. Fairness assessment is thus a crucial component of the pipeline from health data to policy, as it helps ensure health data is used in ways that promote equity and center patient outcomes.

As risk prediction models have proliferated, so have techniques for identifying and correcting bias in the models. Broadly constituting the field of "algorithmic fairness", these techniques typically compare some measure of model performance across groups of a social characteristic like race or gender. Our work addresses two aspects that have been less well-explored in the algorithmic fairness literature. We unite these two aspects to offer a unique contribution that is of particular relevance to health policy.

First, most algorithmic fairness work focuses on a single characteristic along which discrimination may occur, for example assessing performance for men vs. women. This simplification fails to account for the fact that discrimination comes in many forms that interact in context-dependent ways. For example, during the COVID-19 pandemic, risk prediction models were used to guide decisions like prioritization of monoclonal antibody treatments. It is well known that older patients and those from racially minoritized groups experience greater risk from COVID-19. However, the effect of age on risk also differs across racial groups. Fairness assessments must therefore consider not just age and race separately, but also how these characteristics interact. The definitions we propose are among only a few fairness techniques to account for multiple, intersecting protected characteristics.

The second under-explored aspect of algorithmic fairness is the fact that in policy contexts, risk predictions are typically used to guide interventions. Recent algorithmic fairness work has demonstrated that when decisions are made on the basis of risk scores, unique types of unfairness can result. The authors of this recent work propose the counterfactual fairness framework to identify and mitigate such biases. However, the framework is designed in non-medical contexts and does not account for multiple, intersecting characteristics as mentioned above.

We propose tools for intersectional, counterfactual fairness measurement designed with particular attention to clinical risk prediction models and health policy contexts. We demonstrate the use of our methods on a COVID-19 risk prediction model used by a major health system. Our fairness measures can be deployed by health systems to evaluate any risk model, giving our work potentially broad implications for the development and implementation of data-driven health policy. 

Presenting Author

Solvejg Wastvedt

First Author

Solvejg Wastvedt

CoAuthor(s)

Jared Huling, University of Minnesota
Julian Wolfson, University of Minnesota

Adjusting for verification and selection bias to improve clinical risk model validation

Clinical risk prediction tools are frequently developed from large studies in order to improve public health monitoring, doctor-patient decision-making and clinical trial management. Posting such tools online facilitates rapid external validation across heterogeneous patient populations. While external model validation is extremely important, variation of validation performance even on seemingly similar patient populations can lead to confusion over the utility of the tool, thus discouraging its use. In Pfeiffer et al, Statistics in Medicine, 2022 we formalize the concepts of reproducibility versus transportability of clinical risk tools, as well as differential selection and verification bias between training data for a risk tool and validation data used for performance evaluation. When individual level information from both the training and validation data sets is available, we propose weighted versions of the validation metrics that adjust for differences in the risk factor distributions (selection bias) and in probability of outcome verification (verification bias) between the training and validation data. We suggest that validation studies report both the weighted and unweighted performance measures to provide comprehensive evaluation of risk tools. We illustrate the methods by developing and validating a model for predicting prostate cancer risk using data from two large North American prostate cancer prevention trials, the SELECT and PLCO trials. 

Presenting Author

Donna Ankerst, Technical University of Munich

CoAuthor

Ruth Pfeiffer, NIH/NCI

Estimating Disease Heritability from Electronic Healthcare Records

Objective: Chronic diseases are a major driver of rising healthcare costs. Accurate predictions of disease risk are integral to effective disease prevention initiatives and patient treatment strategies. A family history of a chronic disease, which reflects both genetics and shared environments, often predicts disease risk, with predictive value determined by heritability, the proportion of variation in risk explained by inherited genetic factors. Electronic healthcare records (EHRs) are frequently used to study chronic diseases, and when linked to familial relationship information could also been used to measure family disease histories for disease risk prediction. Our objective was to assess the validity of disease heritability estimates from EHRs that capture familial relationships and disease diagnoses.

Methods: A population-based investigation was conducted using healthcare records from Manitoba, Canada for 1970 to 2021. We constructed family relationships for up to four generations using health insurance registration information containing unique family and individual identifiers. Health histories for family members were created using diagnosis codes in hospital and physician visit records. Linear mixed-effects models were used to estimate heritability (h) for 130 chronic health conditions using open-source Clinical Classifications Software (CCS) that defines clinically-meaningful disease categories. Comparisons between EHR-derived estimates and genetically-derived estimates from published studies were used to assess validity of the methodology.

Results: Health insurance registration data were used to construct relationships for 10,000 families that included 116,879 individuals. Median family size was 9 (interquartile range: 8). Median observation time was 39.6 years (interquartile range: 25.7). Males comprised half (51.0%) of family members. A total of 272,114 familial relationships were identified; slightly more than half (53%) were first degree (i.e., child and parent) relationships. One-third (33.2%) of families were comprised of four generations; only 15.3% were comprised of two generations. Heritability estimates were consistent with published genetically-derived estimates for several conditions, including diabetes (EHR h = 0.29 vs. 0.22), anemia (EHR h = 0.21 vs. 0.20), and asthma (EHR h = 0.34 vs. 0.33). However, inconsistencies in heritability estimates were identified for pancreatic disorders, gastrointestinal conditions, some mental health conditions, and heart disease.

Conclusion: EHRs provide a promising and novel approach to explore heritability of selected health conditions in large and diverse populations, which is of value for producing disease risk predictions with high generalizability. Such risk predictions are essential for informing chronic disease prevention and treatment policies. However, inconsistencies between EHR-derived and genetically-derived estimates are indicative of the limitations of diagnoses recorded for administrative purposes. Future research will explore sex-specific heritability estimates, effects of change in disease diagnosis coding over time on heritability estimates, and utility of family health histories in risk prediction models for diseases with high heritability. 

Presenting Author

Lisa Lix, University of Manitoba

First Author

Lisa Lix, University of Manitoba

CoAuthor(s)

Amani Hamad, University of Manitoba
Lin Yan, University of Manitoba
Joseph Delaney, University of Manitoba
elizabeth Wall-Wieler, University of Manitoba
Mohammad Jafari Jozani, University of Manitoba
Shantanu Banerji, CancerCare Manitoba
Olawale Fatai Ayilara, University of Manitoba
Pingzhao Hu, University of Manitoba

Use NER to Derive Health Policy Insights from Social Media Data

Introduction
Having up-to-date information about public opinion and marketing dynamics surrounding tobacco and vaping products is vital to public health policy. Not only do many marketing campaigns target young audiences, but also the spread of misinformation about the health impacts of these products is frequent in certain communities. One central task involved in monitoring marketing and public opinion dynamics around tobacco and vaping products entails identifying specific brands and product flavors. Social media data offer the promise of near real-time monitoring, but, because the data are unstructured and largely text, locating mentions of brands and flavors is a methodologically onerous task. This presentation showcases NORC's efforts using natural language processing (NLP) to address this challenge.
Methods
Specifically, we use named-entity recognition (NER) to locate mentions of brands and flavors in Twitter posts. NER involves identifying specific words or character strings within a larger text that are instances of a type of entity. For example, if brand and flavor are types of entity, 'JUUL' and 'menthol' would be instances of brand and flavor, respectively. Many off-the-shelf tools exist for NER, but they typically do not identify brands and flavors, instead identifying entities like persons, organizations, times, dates, and currencies. Custom NER has proven to be a powerful tool for identifying brand and flavor mentions on other social media platforms, such as Instagram (Chew et al., 2022).
Our team used Azure Cognitive Service for Language to develop a custom NER model that identifies mentions of brands and flavors within Twitter posts about tobacco and vaping products. Doing so enabled us to leverage transfer learning by fine-tuning their pre-trained language model with vape-related Twitter data. That is, Azure has previously trained a language model, and we fine-tuned it on a set of tweets specifically pertaining to tobacco and vaping products. Although Azure Custom NER imposes certain methodological restrictions, this downside is outweighed by the ability to rapidly cycle through the custom NER model development process.
Data
Twitter is one of the most widely used social media platforms and is regularly used to monitor public health, for example to inform public health policy surrounding COVID-19. We collected Twitter data based on tobacco- and vaping-related search terms, then constructed a training sample based on the presence of in-text mentions of popular vape brands and flavors, which were informed by transaction data from 2014 through 2018. In total, the training sample was comprised of 2,311 brand mentions and 2,339 flavor mentions from 2,242 tweets. We used an 80/20 train-test split.
Results
Our model achieved high performance in detecting brand and flavor mentions. For brands, we observed an F1 score of 90.48% (precision of 90.39% and recall of 90.57%), and for flavors, we observed an F1 score of 90.27% (precision of 90.17% and recall of 90.36%). In addition to the notable performance, we realized substantial reductions in coding time, data labeling, model deployment and maintenance compared to the time historically spent on these tasks.
Conclusion
Our efforts illustrate that custom NER can upgrade the pipeline from health data to health policy. In addition to superior performance, cloud-based services that enable rapid model development, iteration, and deployment empower researchers to maintain pace with the rapidly changing market and associated health concerns. Our presentation will conclude with a discussion of on-going efforts to improve model performance and of practical insights into when customer NER and cloud-based computing solutions are effective tools.
References
Chew R, Wenger M, Guillory J, Nonnemaker J, Kim A. Identifying Electronic Nicotine Delivery System Brands and Flavors on Instagram: Natural Language Processing Analysis. J Med Internet Res. 2022 Jan 18;24(1):e30257. 

Presenting Author

Andrew Norris, NORC at The University of Chicago

First Author

Andrew Norris, NORC at The University of Chicago

CoAuthor(s)

Brandon Sepulvado
Yoonsang Kim, NORC At The University of Chicago
Ganna Kostygina, NORC at the University of Chicago
Sherry Emery

The importance of hospitalization information for mortality prediction

Despite the rapid advancement of machine learning algorithms, pessimism has recently been expressed on their ability to accurately predict patients who are going to die from those who aren't in the long run. The common methods for improving model prediction in the field of machine learning rely heavily on the maneuverability of complex model architecture to forcefully connect input predictors and outcomes, leaving little space for exploiting realistic predictor-outcome relationships. To avoid making machine learning entirely black box, we aim to investigate the possibility of incorporating realistic assumptions into the training of machine learning models for the purpose of enhancing mortality prediction.

While death in the long run is difficult to predict, we hypothesized that for recently hospitalized individuals, imminent death is much more predictable. We further hypothesized, that if the patient survives the acute time-period, the risk largely wanes. We formed a statistical modeling procedure that takes advantage of the short-term risk information to enhance prediction of an outcome event with a longer time-horizon. The general approach is to decompose the time-horizon of the prediction into intervals that take advantage of the short-term predictive power of acute risk factors such as current or recent hospitalization versus direct modeling of the longer-term outcome. We investigate the efficacy of this "predictive-power banking approach" using logistic regression and Extreme Gradient Boosting (XGB). For example, if our goal is to estimate 6 month mortality, in a simple implementation of our approach we might first estimate 1 month mortality and conditional 6 month mortality given survival to 1 month. Unconditional 6 month mortality is then determined using the law of total probability. By decomposing the follow-up time scale into components and estimating separate models on each component, we allow the prediction to benefit from predictors with time-varying coefficients for both logistic regression, XGB, and any other predictive algorithm. The methodology generalizes to allowing a general number of break-points.

The performance of our approach was evaluated on a Medicare health insurance claims dataset for a cohort of patients diagnosed with chronic obstructive pulmonary disease (COPD), chronic kidney disease (CKD) or congestive heart failure (CHF), that allowed us to measure hospitalization and mortality over time as well as patient level baseline characteristics. Consistent with an elevated risk period of 30-days following surgery, we allowed the effect of predictors to change after 30-days of follow-up. We used area under the ROC curve (AUC) and Youden's Index to evaluate prediction accuracy for both our two-step approach and the direct approach. The effects in the estimated models were compared across components to test our hypothesis that those containing information about a recent hospitalization were predictive but primarily only over the short-term. Multifold cross validation was used to demonstrate the consistency of the results. Our general finding is that providing structure to AI or machine learning (ML) algorithms may enhance their overall predictive accuracy as it allows them to focus on the aspects of the prediction that they are best at learning.

Mortality prediction is a challenging yet important topic for the evaluation of health policies. Accurate and reliable estimation of risk for all patients will help align clinical decision making and healthcare resources with patient need. Therefore, this project fits perfectly to the theme of ICHPS as our work improves upon mortality prediction approaches used in statistics and ML with a structural approach informed by medical knowledge. Our results provide novel insights in terms of deciphering health data to inform health policies. 

Presenting Author

Bo Qin

First Author

Bo Qin

CoAuthor(s)

Curtis Petersen, Dartmouth College
Jonathan Skinner, Dartmouth College
James O'Malley, Dartmouth University, Geisel School of Medicine

Kullback-Leibler-Based Discrete Failure Time Models for Integration of Published Prediction Models with New Time-To-Event Dataset

Existing literature for prediction of time-to-event data has primarily focused on risk factors from a single individual-level dataset. These analyses often suffer from rare event rates, small sample sizes, high dimensionality and low signal-to-noise ratios. Incorporating published prediction models from large-scale studies is expected to improve the performance of prognosis prediction on internal individual-level time-to-event data. However, existing integration approaches typically assume that underlying distributions from the external and internal data sources are similar, which is often invalid. To account for challenges including heterogeneity, data sharing, and privacy constraints, we propose a discrete failure time modeling procedure, which utilizes a discrete hazard-based Kullback-Leibler discriminatory information measuring the discrepancy between the published models and the internal dataset. Simulations show the advantage of the proposed method compared with those solely based on the internal data or published models. We apply the proposed method to improve the prediction performance on a kidney transplant dataset from a local hospital by integrating this small-scale dataset with published survival models obtained from the national transplant registry. 

Presenting Author

Di Wang, University of Michigan

First Author

Di Wang, University of Michigan

CoAuthor(s)

Wen Ye, University of Michigan
Randall Sung, University of Michigan
Hui Jiang, University of Michigan
Jeremy Taylor, University of Michigan
Lisa Ly, Temple University
Kevin (Zhi) He, University of Michigan