Print Close

Advances in Generative AI and Machine Learning: Transforming Biomedicine

Huimin Cheng Chair
Boston University

Huimin Cheng Organizer
Boston University

Fatema Shafie Khorassani Organizer

Wednesday, Aug 6: 2:00 PM - 3:50 PM
0236
Invited Paper Session

Music City Center

Room: CC-205C

Applied

Yes

Main Sponsor

Section on Statistical Computing

Co Sponsors

Section on Statistical Learning and Data Science

Section on Statistics in Genomics and Genetics

Presentations

Analyzing CITE-seq Data via a Quantum Algorithm

Quantum advantage has been demonstrated in physics-oriented problems. It remains elusive whether quantum advantage can be established for modern computational biology problems. In this talk, I will introduce a new quantum machine learning algorithm for analyzing single-cell multi-omics data. The proposed algorithm takes advantage of quantum parallelism to enable fast computation. Theoretical results are derived to show the advantages of the proposed algorithm in terms of estimation error and computational complexity. Simulation suggests that our algorithm is effective in a wide range of settings.

Keywords

single-cell experiments

quantum computing

model selection

Grover's algorithm

quantum counting

bioinformatics

Speaker

Ping Ma, University of Georgia

CRISPR-GPT: An LLM Agent for Automated Design of Gene-Editing Experiments

The introduction of genome engineering technology has transformed biomedical research, making it possible to make precise changes to genetic information. However, creating an efficient gene-editing system requires a deep understanding of CRISPR technology, and the complex experimental systems under investigation. While Large Language Models (LLMs) have shown promise in various tasks, they often lack specific knowledge and struggle to accurately solve biological design problems. In this work, we introduce CRISPR-GPT, an LLM agent augmented with domain knowledge and external tools to automate and enhance the design process of CRISPR-based gene-editing experiments. CRISPR-GPT leverages the reasoning ability of LLMs to facilitate the process of selecting CRISPR systems, designing guide RNAs, recommending cellular delivery methods, drafting protocols, and designing validation experiments to confirm editing outcomes. We showcase the potential of CRISPR-GPT for assisting non-expert researchers with gene-editing experiments from scratch and validate the agent's effectiveness in a real-world use case. Furthermore, we explore the ethical and regulatory considerations associated with automated gene-editing design, highlighting the need for responsible and transparent use of these tools. Our work aims to bridge the gap between beginner biological researchers and CRISPR genome engineering techniques, and demonstrate the potential of LLM agents in facilitating complex biological discovery tasks.

Keywords

LLM, CRISPR-GPT, genome engineering

Speaker

Minshuo Chen

Double Generative Learning for Causal Inference

With the rapid development of artificial intelligence, causal inference with observational data has drawn much attention in various scientific domains. A key challenge in this area is the often-violated ignorability assumption, which is critical for unbiased estimation of causal effects, but very hard to check in practice. To address this challenge, we develop Double Generative Learning (DGL), a novel approach that leverages the capabilities of generative adversarial networks (GANs) for robust causal inference under the violation of ignorability. By employing a delicate dual GANs structure, DGL emulates data akin to randomized controlled trials (RCTs) solely based on observational studies, circumventing the biases introduced by unobserved confounders. This methodology not only proposes an elegant solution to the issue of ignorability violation by achieving minimax optimality in robustness but also adeptly manages high-dimensional and complex data structures. Theoretical analysis reveals DGL's capacity to bypass the curse of dimensionality by exploiting the inherent low-dimensional submanifold structures in the data. Through extensive simulation studies and analyses of real-world datasets, DGL's empirical superiority in facilitating robust causal inference under adverse conditions is comprehensively
demonstrated.

Keywords

Average treatment effect; Observational study; Ignorability; Randomized controlled trials; Curse of dimensionality; Generative adversarial networks;

Speaker

Wenxuan Zhong, University of Georgia

Matrix normal Graphical Model for inferring Gene Spatial co-expression

Recent advances in spatially resolved transcriptomics (SRT) have illuminated gene co-expression networks in spatial contexts, offering insights into disease mechanisms. However, current methods, mainly designed for single-cell studies, tend to overlook the intricate interactions between spatial location and gene expression networks. None of them are able to handle the increasingly prevalent large-scale datasets. To address these limitations, we propose a novel matrix normal based method, spMGM, for inferring gene co-expression networks in SRT studies. spMGM accounts for intricate interactions between spatial context and gene expression. Through extensive simulations, both model-based and non-model based, spMGM accurately recovers the underlying gene co-expression network, improving accuracy by 40% - 50% compared to existing methods. Moreover, spMGM can efficiently handle large-scale datasets like 10x Xenium, with 10 times faster than the most advanced method. Applying spGMM to breast cancer tissue demonstrates its ability to detect breast cancer-related hub genes that have not been identified by the other methods.

Keywords

Matrix normal Graphical Model

Gene Spatial co-expression

Speaker

Ying Ma

Random Forest Clustering for Development of Clinical Phenotypes from Cohort Studies

Many medical diagnoses represent heterogeneous conditions that combine a number of subtypes before clinical presentation. Clustering analyses of patients with such diagnoses may reveal these underlying subtypes and help in the development of more homogeneous clinical phenotypes which can be targeted by more specific treatments to prevent disease progression. We present a nonparametric machine learning approach to clustering patients based on the Random Forest algorithm which accommodates the mixed variable types and skewness of standard medical data. To illustrate the approach we use cohort data from the Multicenter Osteoarthritis Study and from the similarly-designed Osteoarthritis Initiative Study to evaluate subtypes of patients undergoing knee replacement surgery and compare the cluster results to those obtained by the k-means clustering algorithm. We find the Random Forest approach to produce clusters with greater interpretability and with less impact from the study design features than the k-means algorithm.

Keywords

Unsupervised Learning

Classification Trees

Biomedical Data

Osteoarthritis

Speaker

Michael LaValley, Boston University