Using Augmented Twin Neural Networks to Match Occupational Injury and Illness Data
Elan Segarra
Presenting Author
US Bureau of Labor Statistics
Wednesday, Aug 6: 10:50 AM - 11:05 AM
2119
Contributed Papers
Music City Center
When record linkage efforts involve complex characteristics there is potential for machine learning (ML) techniques to succeed where traditional probabilistic linkage methods (e.g. Fellegi-Sunter) might fall short. However, there can still be pre-processing (e.g. geocoding) and hand-picked metrics (e.g. edit distances) that can further improve linkage outcomes beyond ML models' abilities. We present a fusion of these sides we are calling an Augmented Twin Neural Network. This approach leverages the nonlinear flexibility of Twin Neural Networks while adding additional layers to allow for hand curated comparators that may be difficult for ML optimizers to implicitly identify without sufficiently large, labeled data sets. The framework is used to match businesses from the BLS Survey of Occupational Injuries and Illnesses to businesses in the OSHA Injury Tracking Application data. Difficulties in matching company names and addresses and the existence of multi-establishment firms make this a prime application for testing. Linkage outcome metrics of this augmented method are compared with the results from both probabilistic and standard ML methods to illustrate the added benefits.
record linkage
entity resolution
machine learning
neural networks
probabilistic matching
Main Sponsor
Record Linkage Interest Group
You have unsaved changes.