Print Close

Single-Cell Embedding from Language Models

Presented During: Advances in Foundation Models and LLMs for Biomedical Data Science

Hongyu Zhao Speaker
Yale University

Monday, Aug 4: 2:45 PM - 3:05 PM
Invited Paper Session

Music City Center

Various Foundation Models (FMs) have been built based on the pre-training and fine-tuning framework to analyze single-cell data with different degrees of success. In this presentation, we describe a method named scELMo (Single-cell Embedding from Language Models) to analyze single-cell data that utilizes Large Language Models (LLMs) as a generator for both the description of metadata information and the embeddings for such descriptions. We combine the embeddings from LLMs with the raw data under the zero-shot learning framework to further extend its function by using the fine-tuning framework to handle different tasks. We demonstrate that scELMo is capable of cell clustering, batch effect correction, and cell-type annotation without training a new model. Moreover, the fine-tuning framework of scELMo can help with more challenging tasks, including in-silico treatment analysis and modeling perturbation. scELMo has a lighter structure and lower resource requirements. Moreover, our method is comparable to recent large-scale FMs (such as scGPT and Geneformer) based on our evaluations, suggesting a promising path for developing domain-specific FMs.

Keywords

Single cell

foundation models

large language models

embedding

clustering

annotation