A Bayesian Approach to Linking Historical Records of America's Enslaved

Andee Kaplan Co-Author
Colorado State University
 
Hannah Butler First Author
 
Hannah Butler Presenting Author
 
Thursday, Aug 7: 8:35 AM - 8:50 AM
2309 
Contributed Papers 
Music City Center 
Probabilistic record linkage is an efficient method to connect records from the same entity across data sources without reliable identifiers. Commonly, variation present in the data is due to circumstance rather than error. For example, nicknames can be used in certain contexts rather than proper names. A record with non-erroneous variation tells one part of a greater story. We call such a record an "alias" of the entity from which it is derived. Entities with multiple aliases provide richer information to link entities, but the increased complexity requires a careful approach. Existing record linkage approaches use pre- or post-hoc methods to prevent conflicts due to aliases, which can lead to additional bias and an inability to quantify uncertainty. Instead of forcing the data to fit existing models, we propose a model to fit the data. Our fully Bayesian approach accounts for known aliases in the data and requires no post-hoc processing of link estimates, maintaining uncertainty quantification. We demonstrate the accuracy of our model and apply it to linking historical records of African Americans trafficked in the coastwise slave trade.

Keywords

record linkage

Bayesian inference

Historical data

uncertainty quantification

aliased data 

Main Sponsor

Social Statistics Section