Using Augmented Twin Neural Networks to Match Occupational Injury and Illness Data

Elan Segarra First Author
US Bureau of Labor Statistics
 
Elan Segarra Presenting Author
US Bureau of Labor Statistics
 
Wednesday, Aug 6: 10:50 AM - 11:05 AM
2119 
Contributed Papers 
Music City Center 
When record linkage efforts involve complex characteristics there is potential for machine learning (ML) techniques to succeed where traditional probabilistic linkage methods (e.g. Fellegi-Sunter) might fall short. However, there can still be pre-processing (e.g. geocoding) and hand-picked metrics (e.g. edit distances) that can further improve linkage outcomes beyond ML models' abilities. We present a fusion of these sides we are calling an Augmented Twin Neural Network. This approach leverages the nonlinear flexibility of Twin Neural Networks while adding additional layers to allow for hand curated comparators that may be difficult for ML optimizers to implicitly identify without sufficiently large, labeled data sets. The framework is used to match businesses from the BLS Survey of Occupational Injuries and Illnesses to businesses in the OSHA Injury Tracking Application data. Difficulties in matching company names and addresses and the existence of multi-establishment firms make this a prime application for testing. Linkage outcome metrics of this augmented method are compared with the results from both probabilistic and standard ML methods to illustrate the added benefits.

Keywords

record linkage

entity resolution

machine learning

neural networks

probabilistic matching 

Main Sponsor

Record Linkage Interest Group