Tuesday, Aug 5: 2:55 PM - 3:20 PM
Invited Paper Session
Music City Center
To maintain the explainability of artificial intelligence (AI) in medical imaging, two primary approaches are commonly employed: (1) information-driven analysis based on physical measurements and error distribution, and (2) objective-driven interpretation within a pre-configured environment with prior knowledge of error distribution. Recent developments in deep learning underscore the need for trustworthy AI, emphasizing the importance of traceability and explainability to ensure robustness and reliability in patient-centered outcomes research.
Attention-gate mechanisms have emerged as a promising method to facilitate explainable knowledge-based learning, utilizing pretraining attention layers during AI model development (Vaswani,2017;Yu 2023). In clinical trials, quantitative scores derived from CT images have been instrumental in assessing therapeutic effects. One such metric, the Quantitative Lung Fibrosis (QLF) score, uses AI/machine learning (ML) to quantify patterns of pulmonary fibrosis on high-resolution CT (HRCT) images. The ML-based QLF score has been successfully employed as a clinical trial outcome measure (Kim, 2021).
The aim of this study was to develop a new deep learning (DL) algorithm, leveraging annotations from the ML-QLF score as a reference, and to compare the performance of models with and without an attention-gate layer.
Methods/Background
The study cohort comprised 1,080 anonymized, thin-section, non-contrast chest HRCT scans, including cases of interstitial lung disease and other infectious diseases. These scans were divided into training (n=864), validation (n=108), and independent testing (n=108) sets. The development of the DL fibrosis (DL-FIB) algorithm involved three key steps:
1. Input: Segmented annotated regions of ML QLF using a watershed algorithm as the reference truth.
2. Model Architecture: A residual U-Net model with and without attention-gate layers
3. Optimization & Regularization: Evaluation of classification performance in fibrotic reticulation within lung segmentation.
Post-training, quantitative scores from CT images were collected with and without attention-gate layers. An independent validation cohort of 889 HRCT scans was used to test the final model. The concordance correlation coefficient (CCC) was calculated to estimate the agreement between ML QLF and the two DL-FIB scores.
Results/Findings
ML-QLF scores were consistently distributed across training, validation, and test sets with means (± standard deviation) of 10.8% (± 14.8) for training, 11.1% (± 15.3) for validation, and 10.1% (± 13.2) for testing.
High CCC values indicated strong agreement between ML QLF and DL-FIB scores in the test set:
• Attention-Gate + Residual U-Net: CCC = 0.913, 95% CI [0.880, 0.946].
• Residual U-Net (without Attention-Gate): CCC = 0.957, 95% CI [0.941, 0.972].
Mean (± standard deviation) differences between DL-FIB and ML QLF scores:
• With Attention-Gate: -2.4% (± 4.4%).
• Without Attention-Gate: -0.77% (± 3.4%).
Both DL-FIB models demonstrated good agreement with ML QLF scores, though DL-FIB scores tended to slightly underestimate pulmonary fibrosis compared to ML QLF.
Overall, for the independent validation 889 HRCT cohort scores using the residual U-Net showed results:
• Volumetric (n=209): CCC = 0.873, 95% CI [0.844, 0.901].
• Non-Volumetric (n=680): CCC = 0.496, 95% CI [0.444, 0.548].
Conclusion
Rapid quantification and visualization of fibrotic reticular patterns on high-resolution CT were feasible with a high concordance, demonstrating the potential of deep learning to enhance pulmonary fibrosis classification. The performance of models with and without attention-gate layers was not significantly different, although models without attention-gates showed numerically higher concordance. This suggests that a residual U-Net with detailed segmented annotations as reference truth may suffice, particularly in ILD cohorts, rendering attention-gate layers potentially unnecessary.
The DL model, trained using machine-readable segmented boundaries, enables rapid voxel-level quantification of lung fibrosis (< 2-minute inference time), providing conservative estimates for volumetric CT scans. However, AI-driven results must be carefully interpreted due to limitations in explainability. It is crucial to mitigate risks associated with DL fibrosis scores by employing quality control parameters, including information-driven speculations, or maintaining attention modules in pre-configured environments.
Artificial intelligence (AI)
trustworthy AI (TAI)
eXplainable AI (XAI)
quantitative lung fibrosis
information-driven quality control parameters