Semi-Supervised Transformer for Rare Disease Phenotyping
Abstract Number:
1442
Submission Type:
Contributed Abstract
Contributed Abstract Type:
Poster
Participants:
Kimberly Greco (1), Zongxin Yang (2), Mengyan Li (3), Tianxi Cai (1)
Institutions:
(1) Harvard University, N/A, (2) Harvard Medical School, N/A, (3) Bentley University, N/A
Co-Author(s):
First Author:
Presenting Author:
Abstract Text:
Training robust phenotyping algorithms for rare disease research is challenging due to the scarcity of labeled data and biases in label sources. Gold-standard labels from registries and expert chart review ensure accuracy but suffer from selection bias and high ascertainment costs, while electronic health record (EHR)-derived labels encompass a broader patient population but introduce noise. To address these challenges, we propose a semi-supervised transformer framework that integrates gold-standard labels with iteratively updated silver-standard labels from structured EHR data to train a flexible disease classifier. Initial medical code embeddings are learned from EHR co-occurrence patterns, then refined and aggregated into patient-level representations via a multi-layer transformer that leverages self-attention to dynamically capture long-range dependencies across a patient's medical history. Validated on EHR data from Boston Children's Hospital, our model significantly improves downstream phenotype classification, patient clustering, and disease progression prediction over baseline methods, offering a scalable and generalizable solution for phenotyping in data-limited settings.
Keywords:
Semi-Supervised Learning|Transformers|Phenotyping|Electronic Health Records|Rare Diseases| Machine Learning
Sponsors:
Section on Statistical Learning and Data Science
Tracks:
Machine Learning
Can this be considered for alternate subtype?
Yes
Are you interested in volunteering to serve as a session chair?
Yes
I have read and understand that JSM participants must abide by the Participant Guidelines.
Yes
I understand that JSM participants must register and pay the appropriate registration fee by June 3, 2025. The registration fee is non-refundable.
I understand
You have unsaved changes.