CS004 - Invited: Improving Health Policy Decisions by Using Novel Record Linkage Methods

Conference: International Conference on Health Policy Statistics 2023
01/10/2023: 9:00 AM - 10:45 AM MST
Invited 

Description

To make informed health policy decisions it is important to rely on complete and accurate data sources. Individuals' health information is commonly dispersed over multiple data sources. Integrating individual level data across administrative datasets, registries and surveys is relatively straightforward with unique identifiers (e.g. social security numbers). However, privacy regulations often limit access to unique identifiers and only partially identifying information is available to link records that represent the same individual. In the absence of unique identifiers, record linkage methods can be used to link information on the same individual from multiple data sources. The linked dataset generally provides more accurate and complete information than each of the constituent data sources by themselves. For example, linked data sources can improve the accuracy of surveys by providing additional information to impute missing values or adjust for erroneous recording. Linked data sources also enable the estimation of relationships between variables that are exclusive to one of the data sources. Linking health related data is complex because many times there is a limited overlap between individuals in the different data sources and linking variables are prone to errors. This session will present novel record linkage methods and their applications to routinely collected healthcare data, registries and survey data. These methods will address linkage of multiple files, propagation of linkage errors in downstream analyses, validation of the linkage results and utilization of data features to improve the linkage accuracy.

Keywords

Record Linkage

Error Propagation

Healthcare datasets

Entity Resolution 

Organizer

Roee Gutman, Brown University

Chair

Gauri Kamat, Brown University

Presentations

A Bayesian Multi-Layered Record Linkage Procedure to Analyze Functional Status of Medicare Patients with Traumatic Brain Injury

Understanding associations between injury severity and the potential for post-acute care recovery for patients with traumatic brain injury (TBI) is crucial to improving care. Estimating this association requires information on patients' injury severity, demographics, and healthcare utilization, which are dispersed across different datasets. Because of privacy regulations, unique identifiers are not available to link records across datasets. Record linkage methods identify records that represent the same entity across datasets in the absence of unique identifiers. With large number of records, these methods are computationally intensive and may result in many false links. Blocking is a technique to reduce the number of possible links that should be considered by ensuring records representing the same entity only if they agree on key variables. In healthcare applications, health providers constitute a blocking scheme for patients. Specifically, only record pairs represent the same entity if they are receiving care from the same provider. In some cases, providers are uniquely defined within each dataset, but they cannot be uniquely identified across files. We propose a Bayesian record linkage procedure that simultaneously links health providers and records. This procedure improves the quality of links compared to current methods. We use this procedure to merge a trauma registry with Medicare claims to estimate the relationship between injury severity and TBI patients' recovery. After linkage, we did not find significant associations between injury severity and the propensity of TBI patients to be discharged home after admissions to skilled nursing facilities. These findings highlight that in a population of older adults with TBI commonly used severity indices have limited ability to predict post–acute care outcomes. Further research is needed to identify levels of functional impairments and cognitive deficits that are associated with successful discharge from skilled nursing facility among older patients with TBI. 

Speaker

Roee Gutman, Brown University

A general framework for regression with mismatched data based on mixture modeling


Data sets obtained from linking multiple files are frequently affected by mismatch error, as a result of non-unique or noisy identifiers used during record linkage. Accounting for such mismatch error in downstream analysis performed on the linked file is critical to ensure valid statistical inference. In this talk, we present a generic framework to enable valid post-linkage inference in the challenging secondary analysis setting in which only the linked file is given. The proposed framework can flexibly incorporate additional information about the underlying record linkage process, and covers a wide selection of statistical models. Specifically, we propose a pseudo-likelihood approach that is based on two-component mixture models whose two components represent specific distributions conditional on a pair of records being a correct match or mismatch, respectively. We will illustrate the effectiveness of the proposed approach via a simulation study, and then present two applications of the approach to real-world data sets, demonstrating contingency table analysis and semiparametric regression using penalized splines.
 

Speaker

Brady West, Institute for Social Research

CoAuthor(s)

Guoqing Diao, George Washington University
Martin Slawski, George Mason University
Zhenbang Wang, George Mason University
Emanuel Ben-David, US Census Bureau

Multifile Probabilistic Record Linkage for Drug Overdose Surveillance and Public Health Action

While record linkage is a commonplace problem in public health work, rule-based deterministic methods have historically been more common in practice than probabilistic methods. In this talk, we will describe efforts to put modern probabilistic record linkage methods into practice at Public Health - Seattle and King County. As part of the overdose surveillance work at Public Health - Seattle and King County, we are interested in measuring the incidence of fatal and non-fatal overdoses following jail stays. This requires the linkage of King County jail booking data, emergency medical services data, and death certificate data. We will detail a recently proposed probabilistic record linkage method, multilink, how it was used to link these sources, and the subsequent analyses of overdose incidence following jail stays.
 

Speaker

Serge Aleshin-Guendel, University of Washington

Race and Ethnicity Modeling Applied to Linked Heath Data

Linked data enable robust analyses using variables from more than one source and can be used to validate those on either. The National Center for Health Statistics (NCHS) links survey and administrative data to expand the analytic utility of its surveys. The National Hospital Care Survey collects administrative claims or electronic health record (EHR) data on patients from participating hospitals. Reporting of race/ethnicity in these records is sparse and inconsistent. To support health equity research, we have modeled race/ethnicity using last name and geographic race/ethnicity frequencies from the 2010 Census data in a Bayesian framework. To evaluate the quality of imputations from this model, we make use of linked datasets from the NCHS data linkage program and compare the imputed values to those in administrative data. Further validation is made by applying the model to NCHS's National Health Interview Survey (NHIS). NHIS collects self-reported race/ethnicity, which is considered the gold standard. This presentation will describe the imputation methodology and validation methods used and will highlight how this method can be applied to other sources for health equity research. 

Speaker

Dean Resnick, NORC at The University of Chicago