A Bayesian Approach to Linking Historical Records of America's Enslaved
Thursday, Aug 7: 8:35 AM - 8:50 AM
2309
Contributed Papers
Music City Center
Probabilistic record linkage is an efficient method to connect records from the same entity across data sources without reliable identifiers. Commonly, variation present in the data is due to circumstance rather than error. For example, nicknames can be used in certain contexts rather than proper names. A record with non-erroneous variation tells one part of a greater story. We call such a record an "alias" of the entity from which it is derived. Entities with multiple aliases provide richer information to link entities, but the increased complexity requires a careful approach. Existing record linkage approaches use pre- or post-hoc methods to prevent conflicts due to aliases, which can lead to additional bias and an inability to quantify uncertainty. Instead of forcing the data to fit existing models, we propose a model to fit the data. Our fully Bayesian approach accounts for known aliases in the data and requires no post-hoc processing of link estimates, maintaining uncertainty quantification. We demonstrate the accuracy of our model and apply it to linking historical records of African Americans trafficked in the coastwise slave trade.
record linkage
Bayesian inference
Historical data
uncertainty quantification
aliased data
Main Sponsor
Social Statistics Section
You have unsaved changes.