Print Close

Quantifying Intersectional Bias in Clinical Foundation Models: A Counterfactual Digital Twin Audit

Presented During: CS009 Lightning Session 1, Part 1

Conference: Symposium on Data Science and Statistics (SDSS) 2026

04/29/2026: 1:15 PM - 2:45 PM CDT
Lightning

Description

As General Purpose Artificial Intelligence (GPAI) becomes a de facto "learned intermediary" in clinical workflows, traditional single-axis audits fail to capture "Intersectional Toxicity"-where racial and socioeconomic biases synergistically amplify. This study evaluates "Prognostic Fatalism" in GPAI, examining how foundation models may automate structural racism by conflating social risk with biological determinism. We utilized a Counterfactual Digital Twin audit protocol, derived from guideline-compliant case reports of oropharyngeal cancer, to isolate causal inference. By holding biological ground truth constant while systematically permuting Race (White/Black) and Insurance Status (Private/Medicaid) across identical patient vignettes, we isolated the model's internal reasoning architecture from clinical evidence. We performed a stability analysis (n=10) of Gemini 3.0 Pro's 5-year overall survival (OS) estimates for complex, high-stakes scenarios.To provide a "white-box" view of decision-making, we developed the Reasoning Attention Index (RAI), a forensic linguistic metric quantifying "Semantic Drift"-the model's shift from clinical evidence (pathological tokens) to social profiling (sociodemographic tokens). Compounded marginalization (Black/Medicaid) triggered a statistically significant survival "crash" from 65.0% to 54.3% (p=0.002), with a 40% "fatalistic failure rate" for potentially curable conditions. The RAI increased ten-fold for marginalized profiles (0.38 vs. 0.04, p<0.001), indicating a massive diversion of computational attention toward social factors rather than biological markers. Qualitative analysis revealed "Staging Drift" in 60% of iterations, where the AI incorrectly applied terminal staging logic to favorable biology based solely on social markers. These findings suggest that current FDA and NIST risk frameworks require a paradigm shift toward mandating intersectional stress-testing and reasoning-state audits. Such oversight is essential to prevent a "Digital Nocebo Effect" and ensure clinical AI compliance

Keywords

Counterfactual Audit

Algorithmic Auditing

Intersectional Bias

Presenting Author

Lei Guo, VA St. Louis HealthCare System

First Author

Lei Guo, VA St. Louis HealthCare System

CoAuthor

Shuimei Liu, China University of Political Law and Science

Tracks

AI and LLM Applications

Symposium on Data Science and Statistics (SDSS) 2026