62: Weakly Supervised Transformer for Rare Disease Phenotyping
Alon Geva
Co-Author
Boston Children's Hospital
Monday, Aug 4: 2:00 PM - 3:50 PM
1442
Contributed Posters
Music City Center
Rare diseases affect an estimated 300-400 million people worldwide, yet individual conditions remain poorly characterized and difficult to diagnose due to their low prevalence and limited clinician familiarity. Efforts to automate rare disease detection through computational phenotyping are limited by the scarcity of labeled data and biases in available label sources. Gold-standard labels from registries or expert chart review offer high accuracy but suffer from selection bias and high ascertainment costs, while labels derived from electronic health records (EHRs) capture broader patient populations but introduce noise. To address these challenges, we propose a weakly supervised, transformer-based framework that integrates gold-standard labels with iteratively refined silver-standard labels from EHR data to train a scalable and generalizable phenotyping model. We first learn concept-level embeddings from EHR co-occurrence patterns, which are then refined and aggregated into patient-level representations using a multi-layer transformer. Using rare pulmonary diseases as a case study, we validate our framework on EHR data from Boston Children's Hospital. Our approach improves phenotype classification, uncovers clinically meaningful subphenotypes, and enhances disease progression prediction, enabling more accurate and scalable identification and stratification of rare disease patients.
Semi-Supervised Learning
Transformers
Phenotyping
Electronic Health Records
Rare Diseases
Machine Learning
Main Sponsor
Section on Statistical Learning and Data Science
You have unsaved changes.