Tuesday, Aug 5: 2:00 PM - 3:50 PM
0447
Invited Paper Session
Music City Center
Room: CC-209B
Applied
Yes
Main Sponsor
Section on Medical Devices and Diagnostics
Co Sponsors
Biometrics Section
Presentations
We propose a novel statistical framework for causal clustering to identify disease phenotypes driven by heterogeneous treatment effects (HTEs), addressing the critical need for therapy optimization in complex diseases. Traditional clustering methods, which rely solely on feature similarity, often fail to account for treatment response heterogeneity. Our approach integrates estimated conditional average treatment effects (CATEs) into a supervised clustering algorithm, using a penalized latent Gaussian mixture model to prioritize features with significant treatment effect modification, and groups patients into subtypes with maximally divergent CATE distributions. We demonstrate the method's utility in ischemic cardiomyopathy (ICM), where optimal selection for revascularization and mitral valve intervention remains challenging. Using high-dimensional radiomic features from cardiac magnetic resonance (CMR) imaging alongside clinical variables, our approach successfully identifies distinct disease subtypes with differential treatment benefits. The results show enhanced patient stratification compared to traditional clustering methods, with clear implications for treatment optimization. This framework provides a statistically rigorous approach for incorporating causal effects into disease sub-phenotyping, with broad potential applications in precision medicine beyond cardiovascular disease.
Keywords
Causal Machine Learning
Clustering
Medical Imaging
Radiomics
Treatment Optimization
The prevalence of artificial intelligence (AI) has highlighted
significant issues, particularly bias in AI systems, which is a
serious concern in fields like medical diagnosis. Bias often arises
from data collection practices that focus on specific populations,
leading to models that exhibit discriminatory behavior and unequal
prediction performance across different groups. To address this, we
propose a novel feature selection method based on deep learning and
statistics, aimed at eliminating discriminatory effects while
preserving predictive performance. This method employs the influence
score (I-score) to account for interactions among multiple features,
allowing for the exclusion of biased features and enhancing model
fairness. We conducted empirical studies using the ISIC 2019 and ASAN
skin lesion datasets, demonstrating that our fair I-score model
effectively classifies skin lesion types by mitigating inherent
biases. Additionally, we introduced a fairness model architecture for
multi-label classification that does not rely on data collection or
pre-processing, addressing biases from multiple risk factors. By
integrating the influence score and the backward dropping algorithm,
we derived important influence features and proposed an operational
definition of fairness based on the area under the receiver operating
characteristic curve. Furthermore, we expanded the application of the
fair influence score model to scenarios with missing sensitive
features, utilizing various imputation methods to construct fairer
models. Our results indicate that, compared to the baseline model, our
approach shows improved fairness and predictive performance on
external validation datasets. Overall, these studies enhance the
fairness of medical diagnosis models and demonstrate that deep
learning can maintain strong predictive capabilities even in diverse
data and sensitive feature absence scenarios, providing valuable
insights for future AI fairness design. This report is based on the
study results with Professor Shaw-Hwa Lo in Columbia University, Dr.
Jacky Chung-Hao Wu, and other collaborators.
Keywords
Artificial Intelligence (AI)
Deep Learning
Fairness
Medical Diagnosis
Influence Score (I-score)
Backward Dropping Algorithm
Speaker
Henry Horng-Shing Lu, Kaohsiung Medical University and National Yang Ming Chiao Tung University
To maintain the explainability of artificial intelligence (AI) in medical imaging, two primary approaches are commonly employed: (1) information-driven analysis based on physical measurements and error distribution, and (2) objective-driven interpretation within a pre-configured environment with prior knowledge of error distribution. Recent developments in deep learning underscore the need for trustworthy AI, emphasizing the importance of traceability and explainability to ensure robustness and reliability in patient-centered outcomes research.
Attention-gate mechanisms have emerged as a promising method to facilitate explainable knowledge-based learning, utilizing pretraining attention layers during AI model development (Vaswani,2017;Yu 2023). In clinical trials, quantitative scores derived from CT images have been instrumental in assessing therapeutic effects. One such metric, the Quantitative Lung Fibrosis (QLF) score, uses AI/machine learning (ML) to quantify patterns of pulmonary fibrosis on high-resolution CT (HRCT) images. The ML-based QLF score has been successfully employed as a clinical trial outcome measure (Kim, 2021).
The aim of this study was to develop a new deep learning (DL) algorithm, leveraging annotations from the ML-QLF score as a reference, and to compare the performance of models with and without an attention-gate layer.
Methods/Background
The study cohort comprised 1,080 anonymized, thin-section, non-contrast chest HRCT scans, including cases of interstitial lung disease and other infectious diseases. These scans were divided into training (n=864), validation (n=108), and independent testing (n=108) sets. The development of the DL fibrosis (DL-FIB) algorithm involved three key steps:
1. Input: Segmented annotated regions of ML QLF using a watershed algorithm as the reference truth.
2. Model Architecture: A residual U-Net model with and without attention-gate layers
3. Optimization & Regularization: Evaluation of classification performance in fibrotic reticulation within lung segmentation.
Post-training, quantitative scores from CT images were collected with and without attention-gate layers. An independent validation cohort of 889 HRCT scans was used to test the final model. The concordance correlation coefficient (CCC) was calculated to estimate the agreement between ML QLF and the two DL-FIB scores.
Results/Findings
ML-QLF scores were consistently distributed across training, validation, and test sets with means (± standard deviation) of 10.8% (± 14.8) for training, 11.1% (± 15.3) for validation, and 10.1% (± 13.2) for testing.
High CCC values indicated strong agreement between ML QLF and DL-FIB scores in the test set:
• Attention-Gate + Residual U-Net: CCC = 0.913, 95% CI [0.880, 0.946].
• Residual U-Net (without Attention-Gate): CCC = 0.957, 95% CI [0.941, 0.972].
Mean (± standard deviation) differences between DL-FIB and ML QLF scores:
• With Attention-Gate: -2.4% (± 4.4%).
• Without Attention-Gate: -0.77% (± 3.4%).
Both DL-FIB models demonstrated good agreement with ML QLF scores, though DL-FIB scores tended to slightly underestimate pulmonary fibrosis compared to ML QLF.
Overall, for the independent validation 889 HRCT cohort scores using the residual U-Net showed results:
• Volumetric (n=209): CCC = 0.873, 95% CI [0.844, 0.901].
• Non-Volumetric (n=680): CCC = 0.496, 95% CI [0.444, 0.548].
Conclusion
Rapid quantification and visualization of fibrotic reticular patterns on high-resolution CT were feasible with a high concordance, demonstrating the potential of deep learning to enhance pulmonary fibrosis classification. The performance of models with and without attention-gate layers was not significantly different, although models without attention-gates showed numerically higher concordance. This suggests that a residual U-Net with detailed segmented annotations as reference truth may suffice, particularly in ILD cohorts, rendering attention-gate layers potentially unnecessary.
The DL model, trained using machine-readable segmented boundaries, enables rapid voxel-level quantification of lung fibrosis (< 2-minute inference time), providing conservative estimates for volumetric CT scans. However, AI-driven results must be carefully interpreted due to limitations in explainability. It is crucial to mitigate risks associated with DL fibrosis scores by employing quality control parameters, including information-driven speculations, or maintaining attention modules in pre-configured environments.
Keywords
Artificial intelligence (AI)
trustworthy AI (TAI)
eXplainable AI (XAI)
quantitative lung fibrosis
information-driven quality control parameters
Toward a new era of clinical research, this talk explores how AI can enhance the design of clinical trials to improve both inclusivity and efficiency. I will first introduce Trial Pathfinder, a computational framework that simulates synthetic patient cohorts using medical records. This tool enables more adaptive and inclusive eligibility criteria, supporting better patient representation and data quality. We will then discuss AI-driven approaches to streamline patient enrollment, including automated patient-trial matching, which improves recruitment efficiency. Together, we will explore how integrating AI can make trials more effective and accessible, accelerating the development of new therapies.
Keywords
Clinical Trials
Artificial Intelligence (AI)
Trial Enrollment
Inclusivity