Monday, Aug 4: 10:30 AM - 12:20 PM
4053
Contributed Papers
Music City Center
Room: CC-209B
In this session, presenters will cover their latest research in areas of genomics, microbiome metabolics, and sequencing research and their statistical and data science learning strategies to handle these data
Main Sponsor
Biometrics Section
Presentations
In the field of phylogenetics, researchers traditionally model evolutionary relationships represented by a phylogenetic tree. However, due to the complex nature of evolution, phylogenetic networks offer a robust alternative by describing reticulate processes accurately. Among many tasks relevant to phylogenetic network inference, quantifying distances among networks is a crucial, yet understudied task. One successful approach to improving distance functions on trees is to map trees onto matrix spaces. Any ranked tree of n leaves can be encoded as an integer-valued lower triangular matrix that we call F-matrix. In this talk, we extend the definition of F-matrix to the case of phylogenetic networks and prove the bijection between network space and matrix space subject to certain constraints. We propose a metric on the space of rooted, ranked and unlabeled phylogenetic network distributions. Once phylogenetic networks are bijectively mapped onto a matrix space, we can calculate distances using Frobenius norm, which makes it possible to conduct statistical analysis of ranked network shapes. We show the utility of our metrics via simulations and an application in infectious diseases.
Keywords
phylogenetic network
distance metric
ranked genealogy
ranked network shape
The microbiome plays an important role in immune responses and inflammation in people with HIV-1 infection. Hence, a deeper understanding of the microbiome, including its function and byproducts, prior to HIV-1 infection is potentially important for prevention and treatment strategies. Towards this end, using stool, oral washes, and plasma biospecimens obtained from men who have sex with men (MSM) who were HIV-1 uninfected at the time of sample collection, we found significant differences in microbial ecologies, gene functions, correlations among bacterial species, their biological processes, and metabolites between MSM who became HIV-1 infected in the future and those who remained HIV-1 uninfected. Significant differences included enrichment of enzymes involved in purine metabolism, lower amino acid metabolism, and higher oxidative stress. Furthermore, using a measure of dysbiosis based on correlations with various data modalities, we identified 59 gut species and 24 oral species as dysbiotic pre-HIV, with the majority being independent of sexual activity.
Keywords
HIV infection
Gut microbiome
Oral microbiome
metabolomics
Co-Author(s)
Yue Chen, University of Pittsburgh
Saby Bera, National Institute of Environmental Health Sciences
Alan K Jarmusch, National Institute of Environmental Health Sciences
Daria Van Tyne, University of Pittsburgh
Frank J Palella, Northwestern University, Chicago
Joseph B Margolick, Johns Hopkins Bloomberg School of Public Health
Kara W Chew, University of California, Los Angeles
Jing Sun, Johns Hopkins University
Jermey Martinson, University of Pittsburgh
Charles R Rinaldo, University of Pittsburgh, Pittsburgh
Shyamal Peddada, NIEHS
First Author
Farnaz Fouladi
Presenting Author
Farnaz Fouladi
Surrogate selection is an experimental design that without sequencing any DNA can restrict a sample of cells to those carrying certain genomic mutations. In immunological disease studies, this design may provide a relatively easy approach to enrich a lymphocyte sample with cells relevant to the disease response because the emergence of neutral mutations associates with the proliferation history of clonal subpopulations. A statistical analysis of clonotype sizes provides a structured, quantitative perspective on this useful property of surrogate selection. Our model specification couples within-clonotype birth-death processes with an exchangeable model across clonotypes. Beyond enrichment questions about the surrogate selection design, our framework enables a study of sampling properties of elementary sample diversity statistics; it also points to new statistics that may usefully measure the burden of somatic genomic alterations associated with clonal expansion. We examine statistical properties of immunological samples governed by the coupled model specification, and we illustrate calculations in surrogate selection studies of melanoma and in single-cell genomic studies.
Keywords
Bayes's rule
clonal expansion
exchangeable birth death process
experimental design
somatic mutation
size bias
Differential abundance analysis is common goal of microbiome studies to uncover association between microbial composition and health conditions. Although many methods have been developed in recent years to analyze such data, there is no single method that performs uniformly better than others. Due to the zero inflation, over dispersion, and compositionality of the data, there is additional challenge in selecting appropriate methods. A few methods are based on Wilcoxon Rank Sum tests and T-test, while others are designed to address zero inflation and compositional effects. The methods can sometimes produce discordant results. Therefore, comprehensive evaluation of the methods, that covers many biologically relevant scenarios is extremely important to choose robust analysis method. We carry out comprehensive evaluation of the differential abundance methods using real data-based simulations data. Specifically, we evaluated methods, limma, edgeR, Aldex2, metagenomeSeq, ANCOM-BC, and LOCOM with respect to FDR and TPR. We found that, although none is robust and flexible, ANCOM-BC and LOCOM consistently work better as compared to other methods.
Keywords
Differential abundance
composition
microbiome
zero inflation
metagenome
over dispersion
Most spatial transcriptomics technologies (e.g. 10x Visium) operate at the multicellular level, where each spatial location often contains a mixture of cells with heterogeneous cell types. Thus, effectively deconvolving cell-type compositions is critical for downstream analysis. Although reference-based deconvolution methods have been proposed, they depend on the availability of reference data, which may not always be accessible. Additionally, within a deconvolved cell type, cellular heterogeneity may still exist, requiring further deconvolution to uncover finer structures for a better understanding of this complexity. Here we present gwSPADE, a gene expression-weighted reference-free SPAtial DEconvolution method for spatial transcriptomics data. gwSPADE requires only the gene count matrix and employs appropriate weighting schemes within a topic model to accurately recover cell-type transcriptional profiles and their proportions at each spatial location, without relying on external single-cell references that may introduce batch effects. gwSPADE demonstrates scalability across various platforms and outperforms existing reference-free deconvolution methods such as STdeconvolve.
Keywords
Deconvolution
Reference-free
Latent Dirichlet allocation model
Weighting scheme
Intratumor heterogeneity (ITH), a hallmark of cancer, is characterized by genetically distinct clusters of cells, or clones, that are spatially organized within a tumor. Copy-number variation (CNV), one of the key drivers of ITH, affects genomic segments by altering the underlying number of chromosomes. Spatial transcriptomics (ST), measuring RNA expression simultaneously from thousands of tissue-locations, offers a unique opportunity to identify the CNV architecture and spatial organization of the cancer-clones. We introduce a robust framework, integrating gene expression, spatial coordinates, and SNPs from ST samples, to identify segments with somatic CNVs and their allele-specific copy-number profiles. Our framework employs a Gaussian mixture model to capture spatially correlated expression patterns and a mixture of Binomial distributions to model the allele counts. Using datasets across multiple ST platforms, we first assessed the quality and signal-to-noise ratio in the SNPs to ensure reliable allele-specific inference. We then demonstrated that the proposed model had superior yet robust performance in discovering CNVs from the malignant region of ST tumor samples.
Keywords
Copy-number variations
Spatial transcriptomics in cancer biology
Intratumor heterogeneity
Multimodal data integration
Almost all the existing multi-marker survival tests focus on time-to-event outcomes. However, panel count data are common in clinical and biomedical studies. Especially in the research of chronic and recurrence diseases, the exact event times of each subject are infeasible or very costly to measure. For example, in the study of early childhood caries, we can determine the increase in the number of teeth with carious lesions between two dental visits, but not the exact time when the lesions developed. In this work, we propose a new suite of set-based genetic association tests for panel count data. These tests can effectively account for genetic effect heterogeneity and adjust for covariates. In addition, we develop small-sample corrections to the tests to enhance the accuracy of the tests under small samples. The simulation study showed that the new tests perform well in terms of size and power under various scenarios and can outperform the existing tests for interval-censored outcomes in various scenarios. To show their practical application, a data set from a genetic study of early childhood caries (ECC) is analyzed with the developed tests to detect ECC-associated genes.
Keywords
genetic heterogeneity
set‐based test
weighted V statistic
recurrent event
Co-Author
Chenxi Li, Michigan State University
First Author
Kun Xia, Michigan State University
Presenting Author
Kun Xia, Michigan State University