Print Close

Anomalous data detection through AI/ML

Maria Kudela Chair

Yuxi Zhao Organizer
Pfizer

Sunday, Aug 3: 4:00 PM - 5:50 PM
0697
Topic-Contributed Paper Session

Music City Center

Room: CC-208A

Applied

Yes

Main Sponsor

Biopharmaceutical Section

Co Sponsors

ENAR

Section on Medical Devices and Diagnostics

Presentations

Anomaly detection through mixture of agents Generative AI

In clinical trials, ensuring the quality and validity of data for downstream analysis and results is paramount, thus necessitating thorough data monitoring. In review of data monitoring strategies, targeted monitoring is appealing for utilizing key risk indicators and statistical monitoring to identify potential issues or anomalies in the data and aiming for real-time remediation of potential errors based on critical risk assessments rather than passive monitoring of past events. However, the majority tools for risk-based monitoring primarily concentrate on overseeing and managing data entry errors and alterations and being descriptive in nature e.g. TargeteCRF (Mitcheletal.,2011). Similarly, the available tools for the statistical monitoring (e.g., Bauer and Johnson (2000), JM et al.(2001), Carstensen et al.(2024)) are mostly being descriptive in nature as well, which may not serve our purpose well. Advances in AI/ML provides powerful techniques for feature/subgroup characterizations and pattern recognition, which can be potentially utilized to identify anomalous patterns for single endpoints, multiple endpoints/multi-modal data collectively, or temporal data. This project is to utilize advances in AI/ML area for anomaly detection and advocate adaptation from AI/ML to monitoring strategy. We'll examine popular generative AI methods like autoencoder, and generative adversarial network in this project. If multiple agents involved such as above, we can apply Bayesian ensemble model for combining all the results to provide a reliable prediction on the classification of anomaly data and potentially assess the overall risk factors like site. Furthermore, with pseudo labels generated, the counterpart of deep neural network under Bayesian framework – Bayesian neural network (BNN) – can be implemented for self-training and automatic classification.

Speaker

Yuxi Zhao, Pfizer

Implementing ML/AI Methods in Quality Tolerance Limits (QTLs) and Clinical Trial Monitoring

The traditional clinical trial monitoring process, which relies heavily on site visits and manual review of accumulative patient data reported through Electronic Data Capture system, is time-consuming and resource-intensive. The recently emerged risk-based monitoring (RBM) and quality tolerance limit (QTL) framework offers a more efficient alternative solution to traditional SDV (source data verification) based quality assurance. These frameworks aim at proactively identifying systematic issues that impact patient safety and data integrity. In this paper, we proposed a machine learning enabled approach to facilitate real-time, automated monitoring of clinical trial QTL risk assessment. Unlike the traditional quality assurance process, where QTLs are evaluated based on single-source data and arbitrary defined fixed threshold, we utilize the QTL-ML framework to integrate information from multiple clinical domains to predict the clinical QTL of variety types at program, study, site and patient level. Moreover, our approach is assumption-free, relying not on historical expectations but on dynamically accumulating trial data to predict quality tolerance limit risks in an automated manner. Embedded within ICH-E6 recommended RBM principles, this innovative machine learning solution for QTL monitoring has the potential to transform sponsors' ability to protect patient safety, reduce trial duration, and lower trial costs.

Speaker

Jianchang Lin, Takeda

Leveraging AI-assisted Central Statistical Monitoring to Elevate Clinical Trial Oversight and Data Quality

With the increasing adoption of Risk-Based Quality Management (RBQM) in clinical trials, Central Statistical Monitoring (CSM) has also gained growing recognition and application. CSM not only enables the early detection of abnormal trends and potential issues at sites but also aids Quality Assurance team in identifying potential audit risk sites. This prepares sponsors for regulatory inspections and ensures more efficient oversight of clinical trial operations and the maintenance of data quality. In the digital age, the application of artificial intelligence has empowered CSM to more efficiently and accurately identify and assess anomalies and inconsistencies in data. It proactively helps to identify errors, fraud, or other issues that may compromise the validity of trial results, thereby enhancing data quality and ensuring trial compliance, patient safety, and data integrity. The digitalization tool will also enhance cross-functional collaboration and communication to facilitate the seamless implementation of Risk-Based Quality Management (RBQM).
This topic will delve into the practical applications of AI-assisted Central Statistical Monitoring within digital platform, exploring how it can be effectively implemented to elevate clinical trial oversight and data quality. We will also touch upon the challenges and opportunities associated with this innovative approach, hoping to offer some valuable insights that can guide its successful integration into the clinical trial landscape.

Co-Author

Jingjing Ye, BeiGene

Speaker

Jingjing Ye, BeiGene

Open-Source Risk-Based Quality Management (openRBQM) Framework for Clinical Trial Data Monitoring with AI/ML Extensions

Background: Risk-Based Quality Management (RBQM) is an adaptive approach to clinical trial monitoring focused on identifying and mitigating risks that could impact patient safety and data quality. Our team developed an open-source RBQM framework that incorporates validated, modularized analytics, featuring Key Risk Indicators (KRIs), quality tolerance limit (QTL), and other statistical monitoring methods to detect clinical trial risks at patient, site, country and study levels. We released over 12 R packages (core + extensions) as the foundation of an internal RBQM analytics system that routinely detect and report clinical trial risk signals for assessment and timely action.

Methods: During a piloting phase with multiple clinical trials, the team created and adopted the new RBQM framework and analytics to identify risk signals from clinical data and track it alongside mitigation actions within a dedicated risk signal management system. The development of this framework included building an automated data pipeline, constructing sophisticated data models, deploying innovative analytic modules including AI/ML, creating dashboards and visualizations, as well as leveraging statistics and data science expertise with technical infrastructures (e.g., GitHub, R/Shiny/JavaScript, AWS Bedrock, Azure DevOps). In one of the extensions, the team implemented a machine learning (ML) module to predict risk signal actions based on historical data, utilizing features such as signal type, severity, study and site characteristics, and previous action patterns. Additionally, a generative AI component was integrated to automate risk signal descriptions and tailored actions suggested based on historical trends.

Results: Early detection of risk signals from clinical trial data with clear mitigation plans using our analytic framework allowed study teams for prompt corrective actions with efficient resources, preventing issues from escalating into major problems in data quality that could compromise the clinical trial's validity and patient's safety. The analytic output contains rich visualizations (interactive plots, data listing, statistical findings) with the ability to drill-down to the underlying data. The innovative process and analytics including AI/ML components have helped clinical trial teams identified and mitigated thousands of key risks at patient, site, country and study levels continuously.

Discussion: Our open-sourced framework with enhanced AI/ML capabilities are being piloted and implemented in ongoing studies and have been well-received by internal clinical trial teams and collaborators in the industry as a PHUSE initiative. The novel AI/ML approaches improved data monitoring efficiency and resource allocation. It provides a more proactive and data-driven framework in clinical trial monitoring and risk management decisions.

Keywords

Risk-Based Quality Management (RBQM)

Co-Author(s)

Jeremy Wildfire
George Wu, Gilead Sciences

Speaker

Zhongkai Wang, Gilead Sciences

Outliers in Survival Analysis: A Clustering Framework and Error Bounds for Conditional Kaplan-Meier Estimators

In this paper, we propose a simple clustering-based model for outliers in survival analysis. Specifically, we model feature vectors to be sampled from a mixture model, where each mixture component is associated with its own survival and censoring time distributions. We define an outlier to be a point sampled from one cluster but whose feature vector is closer to another cluster's center than its own. Under this setup, we derive error upper bounds for $k$-nearest neighbor and kernel Kaplan-Meier estimators. We first show that in a special case where outliers do not arise (when feature vector noise is bounded and the clusters are very well-separated), $k$-nearest neighbor and kernel Kaplan-Meier estimators converge at a rate much faster than previously established in literature (that did not assume clustering structure). However, in the general case when outliers may appear, our error bounds no longer go to 0 as the number of training data increases. We complement this bound with an error lower bound for how well an oracle estimator can estimate a test point's survival function. We show that a commonly assumed condition used to establish the statistical consistency of many survival estimators do not allow the possibility of the outliers we consider in our paper (namely, in our setting, survival and censoring times are not conditionally independent given a feature vector). We supplement our theoretical analysis with numerical experiments on recently developed deep kernel Kaplan-Meier estimators, showing that these estimators naturally learn embedding representations of clustered data that try to keep the clusters well-separated and to limit the presence of outliers.

Speaker

George Chen, Carnegie Mellon University