Generalized Estimating Equation for Modeling Cell-Cell Correlation in Single-Cell RNA Seq Data

Tuo Lin Co-Author
University of Florida
 
Toni Gui Co-Author
University of Florida
 
Nadejda Beliakova-bethell Co-Author
University of California, San Diego
 
Xin Tu Co-Author
University of California San Diego
 
Xinlian Zhang Speaker
University of California, San Diego
 
Monday, Aug 4: 11:15 AM - 11:35 AM
Invited Paper Session 
Music City Center 
For analyzing the single-cell RNA sequencing (scRNA-seq) data, it is believed that cells from the same individual share common genetic and environmental backgrounds and are not statistically independent. Many popularly used methods, such as the default wilcox test in FindMarkers function in the Seurat package do not address this dependence issue, leading to potentially highly inflated type 1 error rates. There are more recent works arguing for the use generalized linear mixed models with a random effect for individual, to properly account for the correlation structure among measures from cells within an individual. However, traditional mixed effect model has strong assumptions that require the same and strictly positive correlation across all cells in the same individual. We demonstrate that this can be rather restrictive for real data we see, given the heterogeneous nature of all cells in the same subject. In case of positive correlation assumption violated, classical random effects model demonstrates consistently biased inference and inflated type I error in differential expression analysis we investigated. We propose to use the generalized estimating equation based semi-parametric approach for this issue and demonstrate its robust and efficient performance in both simulation and real data that focuses on revealing common and unique gene expression signatures in primary CD4+ T cells latently infected with HIV under different conditions.

Keywords

HIV latency

single cell RNA seq