Semi-Supervised Transformer for Rare Disease Phenotyping

Abstract Number:

1442 

Submission Type:

Contributed Abstract 

Contributed Abstract Type:

Poster 

Participants:

Kimberly Greco (1), Zongxin Yang (2), Mengyan Li (3), Tianxi Cai (1)

Institutions:

(1) Harvard University, N/A, (2) Harvard Medical School, N/A, (3) Bentley University, N/A

Co-Author(s):

Zongxin Yang  
Harvard Medical School
Mengyan Li  
Bentley University
Tianxi Cai  
Harvard University

First Author:

Kimberly Greco  
Harvard University

Presenting Author:

Kimberly Greco  
Harvard University

Abstract Text:

Training robust phenotyping algorithms for rare disease research is challenging due to the scarcity of labeled data and biases in label sources. Gold-standard labels from registries and expert chart review ensure accuracy but suffer from selection bias and high ascertainment costs, while electronic health record (EHR)-derived labels encompass a broader patient population but introduce noise. To address these challenges, we propose a semi-supervised transformer framework that integrates gold-standard labels with iteratively updated silver-standard labels from structured EHR data to train a flexible disease classifier. Initial medical code embeddings are learned from EHR co-occurrence patterns, then refined and aggregated into patient-level representations via a multi-layer transformer that leverages self-attention to dynamically capture long-range dependencies across a patient's medical history. Validated on EHR data from Boston Children's Hospital, our model significantly improves downstream phenotype classification, patient clustering, and disease progression prediction over baseline methods, offering a scalable and generalizable solution for phenotyping in data-limited settings.

Keywords:

Semi-Supervised Learning|Transformers|Phenotyping|Electronic Health Records|Rare Diseases| Machine Learning

Sponsors:

Section on Statistical Learning and Data Science

Tracks:

Machine Learning

Can this be considered for alternate subtype?

Yes

Are you interested in volunteering to serve as a session chair?

Yes

I have read and understand that JSM participants must abide by the Participant Guidelines.

Yes

I understand that JSM participants must register and pay the appropriate registration fee by June 3, 2025. The registration fee is non-refundable.

I understand