Latest Genomics, Microbiome, and Sequencing Research

Ali Taheriyoun Chair
George Washington University
 
Monday, Aug 4: 10:30 AM - 12:20 PM
4053 
Contributed Papers 
Music City Center 
Room: CC-209B 
In this session, presenters will cover their latest research in areas of genomics, microbiome metabolics, and sequencing research and their statistical and data science learning strategies to handle these data

Main Sponsor

Biometrics Section

Presentations

A distance metric for unlabeled phylogenetic networks

In the field of phylogenetics, researchers traditionally model evolutionary relationships represented by a phylogenetic tree. However, due to the complex nature of evolution, phylogenetic networks offer a robust alternative by describing reticulate processes accurately. Among many tasks relevant to phylogenetic network inference, quantifying distances among networks is a crucial, yet understudied task. One successful approach to improving distance functions on trees is to map trees onto matrix spaces. Any ranked tree of n leaves can be encoded as an integer-valued lower triangular matrix that we call F-matrix. In this talk, we extend the definition of F-matrix to the case of phylogenetic networks and prove the bijection between network space and matrix space subject to certain constraints. We propose a metric on the space of rooted, ranked and unlabeled phylogenetic network distributions. Once phylogenetic networks are bijectively mapped onto a matrix space, we can calculate distances using Frobenius norm, which makes it possible to conduct statistical analysis of ranked network shapes. We show the utility of our metrics via simulations and an application in infectious diseases. 

Keywords

phylogenetic network

distance metric

ranked genealogy

ranked network shape 

Co-Author(s)

Claudia Solis-Lemus, University of Wisconsin-Madison
Julia Palacios, Stanford University

First Author

Jiayang Wang

Presenting Author

Jiayang Wang

A multi-modal study of microbiomes and metabolomes reveals a system-wide dysbiosis preceding HIV-1

The microbiome plays an important role in immune responses and inflammation in people with HIV-1 infection. Hence, a deeper understanding of the microbiome, including its function and byproducts, prior to HIV-1 infection is potentially important for prevention and treatment strategies. Towards this end, using stool, oral washes, and plasma biospecimens obtained from men who have sex with men (MSM) who were HIV-1 uninfected at the time of sample collection, we found significant differences in microbial ecologies, gene functions, correlations among bacterial species, their biological processes, and metabolites between MSM who became HIV-1 infected in the future and those who remained HIV-1 uninfected. Significant differences included enrichment of enzymes involved in purine metabolism, lower amino acid metabolism, and higher oxidative stress. Furthermore, using a measure of dysbiosis based on correlations with various data modalities, we identified 59 gut species and 24 oral species as dysbiotic pre-HIV, with the majority being independent of sexual activity. 

Keywords

HIV infection

Gut microbiome

Oral microbiome

metabolomics 

Co-Author(s)

Yue Chen, University of Pittsburgh
Saby Bera, National Institute of Environmental Health Sciences
Alan K Jarmusch, National Institute of Environmental Health Sciences
Daria Van Tyne, University of Pittsburgh
Frank J Palella, Northwestern University, Chicago
Joseph B Margolick, Johns Hopkins Bloomberg School of Public Health
Kara W Chew, University of California, Los Angeles
Jing Sun, Johns Hopkins University
Jermey Martinson, University of Pittsburgh
Charles R Rinaldo, University of Pittsburgh, Pittsburgh
Shyamal Peddada, NIEHS

First Author

Farnaz Fouladi

Presenting Author

Farnaz Fouladi

Clone sizes and sampling the T cell repertoire

Surrogate selection is an experimental design that without sequencing any DNA can restrict a sample of cells to those carrying certain genomic mutations. In immunological disease studies, this design may provide a relatively easy approach to enrich a lymphocyte sample with cells relevant to the disease response because the emergence of neutral mutations associates with the proliferation history of clonal subpopulations. A statistical analysis of clonotype sizes provides a structured, quantitative perspective on this useful property of surrogate selection. Our model specification couples within-clonotype birth-death processes with an exchangeable model across clonotypes. Beyond enrichment questions about the surrogate selection design, our framework enables a study of sampling properties of elementary sample diversity statistics; it also points to new statistics that may usefully measure the burden of somatic genomic alterations associated with clonal expansion. We examine statistical properties of immunological samples governed by the coupled model specification, and we illustrate calculations in surrogate selection studies of melanoma and in single-cell genomic studies. 

Keywords

Bayes's rule

clonal expansion

exchangeable birth death process

experimental design

somatic mutation

size bias 

Co-Author(s)

Peng Yu, University of Wisconsin Madison
Mark Albertini, University of Wisconsin Madison
Yumin Lian, University of Wisconsin Madison
Elliot Xie, University of Wisconsin Madison
Cindy Zuleger, University of Wisconsin Madison
Richard Albertini, University of Vermont

First Author

Michael Newton, University of Wisconsin-Madison

Presenting Author

Michael Newton, University of Wisconsin-Madison

WITHDRAWN Comparison of differential abundance analysis methods for microbiome data

Differential abundance analysis is common goal of microbiome studies to uncover association between microbial composition and health conditions. Although many methods have been developed in recent years to analyze such data, there is no single method that performs uniformly better than others. Due to the zero inflation, over dispersion, and compositionality of the data, there is additional challenge in selecting appropriate methods. A few methods are based on Wilcoxon Rank Sum tests and T-test, while others are designed to address zero inflation and compositional effects. The methods can sometimes produce discordant results. Therefore, comprehensive evaluation of the methods, that covers many biologically relevant scenarios is extremely important to choose robust analysis method. We carry out comprehensive evaluation of the differential abundance methods using real data-based simulations data. Specifically, we evaluated methods, limma, edgeR, Aldex2, metagenomeSeq, ANCOM-BC, and LOCOM with respect to FDR and TPR. We found that, although none is robust and flexible, ANCOM-BC and LOCOM consistently work better as compared to other methods. 

Keywords

Differential abundance

composition

microbiome

zero inflation

metagenome

over dispersion 

First Author

Prabhakar Chalise, University of Kansas Medical Center

GwSPADE: reference-free deconvolution in spatial transcriptomics with gene weighting

Most spatial transcriptomics technologies (e.g. 10x Visium) operate at the multicellular level, where each spatial location often contains a mixture of cells with heterogeneous cell types. Thus, effectively deconvolving cell-type compositions is critical for downstream analysis. Although reference-based deconvolution methods have been proposed, they depend on the availability of reference data, which may not always be accessible. Additionally, within a deconvolved cell type, cellular heterogeneity may still exist, requiring further deconvolution to uncover finer structures for a better understanding of this complexity. Here we present gwSPADE, a gene expression-weighted reference-free SPAtial DEconvolution method for spatial transcriptomics data. gwSPADE requires only the gene count matrix and employs appropriate weighting schemes within a topic model to accurately recover cell-type transcriptional profiles and their proportions at each spatial location, without relying on external single-cell references that may introduce batch effects. gwSPADE demonstrates scalability across various platforms and outperforms existing reference-free deconvolution methods such as STdeconvolve. 

Keywords

Deconvolution

Reference-free

Latent Dirichlet allocation model

Weighting scheme 

Co-Author

Yuehua Cui, Michigan State University

First Author

Aoqi Xie

Presenting Author

Aoqi Xie

Robust Inference of Copy Number Variations in Spatial Transcriptomics

Intratumor heterogeneity (ITH), a hallmark of cancer, is characterized by genetically distinct clusters of cells, or clones, that are spatially organized within a tumor. Copy-number variation (CNV), one of the key drivers of ITH, affects genomic segments by altering the underlying number of chromosomes. Spatial transcriptomics (ST), measuring RNA expression simultaneously from thousands of tissue-locations, offers a unique opportunity to identify the CNV architecture and spatial organization of the cancer-clones. We introduce a robust framework, integrating gene expression, spatial coordinates, and SNPs from ST samples, to identify segments with somatic CNVs and their allele-specific copy-number profiles. Our framework employs a Gaussian mixture model to capture spatially correlated expression patterns and a mixture of Binomial distributions to model the allele counts. Using datasets across multiple ST platforms, we first assessed the quality and signal-to-noise ratio in the SNPs to ensure reliable allele-specific inference. We then demonstrated that the proposed model had superior yet robust performance in discovering CNVs from the malignant region of ST tumor samples. 

Keywords

Copy-number variations

Spatial transcriptomics in cancer biology

Intratumor heterogeneity

Multimodal data integration 

Co-Author(s)

Robert Langefeld, University of Michigan
Evan Keller, University of Michigan
Xiang Zhou, University of Michigan

First Author

Kalins Banerjee

Presenting Author

Kalins Banerjee

Set‐based genetic association tests for panel count data based on weighted V statistics

Almost all the existing multi-marker survival tests focus on time-to-event outcomes. However, panel count data are common in clinical and biomedical studies. Especially in the research of chronic and recurrence diseases, the exact event times of each subject are infeasible or very costly to measure. For example, in the study of early childhood caries, we can determine the increase in the number of teeth with carious lesions between two dental visits, but not the exact time when the lesions developed. In this work, we propose a new suite of set-based genetic association tests for panel count data. These tests can effectively account for genetic effect heterogeneity and adjust for covariates. In addition, we develop small-sample corrections to the tests to enhance the accuracy of the tests under small samples. The simulation study showed that the new tests perform well in terms of size and power under various scenarios and can outperform the existing tests for interval-censored outcomes in various scenarios. To show their practical application, a data set from a genetic study of early childhood caries (ECC) is analyzed with the developed tests to detect ECC-associated genes. 

Keywords

genetic heterogeneity

set‐based test

weighted V statistic

recurrent event 

Co-Author

Chenxi Li, Michigan State University

First Author

Kun Xia, Michigan State University

Presenting Author

Kun Xia, Michigan State University