Enhancing OMOP Vocabulary Mapping with a Transformer-Based Semantic-Hierarchical Framework

Dian Zhou Co-Author
University of Illinois Urbana-Champaign
 
Enshuo Hsu Co-Author
University of Texas MD Anderson Cancer Center
 
Jin Zhou Co-Author
Hunan University
 
Jiefei Wang First Author
University of Texas Medical Branch
 
Jiefei Wang Presenting Author
University of Texas Medical Branch
 
Monday, Aug 4: 11:05 AM - 11:20 AM
2288 
Contributed Papers 
Music City Center 
Interoperability across EHR systems is a critical barrier to leveraging healthcare data for policy and research due to inconsistent medical terminologies. The OMOP Common Data Model (CDM) offers a standardized framework to harmonize data across platforms. However, traditional rule-based mapping is labor-intensive, which disproportionately impacts underserved hospitals with limited resources. Existing tools, such as USAGI, alleviate this burden by automating the mapping process, but they struggle with semantic complexity. For example, mapping "Leukemia" to its superclass "Hematologic neoplasm" requires understanding hierarchical relationships that go beyond surface-level text similarity.

In this talk, we propose a novel transformer-based model for automated OMOP terminology mapping that integrates OMOP's vocabulary structure and relational hierarchy. Two special tokens were added to guide the model's focus during training. This dual-task training approach captures ontology-based dependencies beyond surface-level semantics. Preliminary evaluation on the unseen CIEL vocabulary (condition domain) demonstrates improved accuracy and scalability compared to existing methods.

Keywords

sentence transformer

OMOP Common Data Model

semantic similarity

hierarchical relationships

terminology mapping

healthcare data integration 

Main Sponsor

Health Policy Statistics Section