Recent Advances in the Use of Sequence Data in Infectious Disease Tracking

Tuo Lin Chair
University of Florida
 
Ira Longini Discussant
University of Florida
 
Toni Gui Organizer
University of Florida
 
Monday, Aug 4: 10:30 AM - 12:20 PM
0394 
Invited Paper Session 
Music City Center 
Room: CC-207C 

Applied

Yes

Main Sponsor

Section on Statistics in Epidemiology

Co Sponsors

Health Policy Statistics Section
Section on Statistics in Genomics and Genetics

Presentations

The use of the local branching index for characterizing transmission in genomic epidemiological datasets

Relating genomic data to transmission patterns is an ongoing challenge. In many practical applications, researchers focus on clustering: how many of the sequences in a dataset are "clustered with" any other sequence? Clustering is interpreted as a sign of recent transmission. However, clustering is a very simple measure. It depends on the clustering threshold, is a binary variable, and it typically does not account for much evolutionary complexity. Here, we explore how well the local branching index (LBI) can capture patterns of transmission. LBI is a measure of the rate of branching in a phylogenetic tree that is specific to each tip. Tips with very rapid nearby branching are likely to be clustered, as their sequence is likely close to other sequences. LBI can therefore be seen as a continuous variable that should contain some of the same information as clustering. We use simulations and analysis of simulated phylogenetic trees to explore the relationships among LBI, transmission patterns, and sampling.  

Keywords

Infectious diseases

Phylogenetic tree 

Speaker

Alex Beams, Simon Fraser University

Bayesian inference of timed phylogenetic networks from genomic sequences to quantify the horizontal movement of genes for viral and bacterial infectious diseases

The horizontal movement of genes is a crucial driver in the evolution of viral and bacterial pathogens. It enables pathogens to, for example, make large jumps in fitness space, adapt to new host species, or gain novel genes, such as acquiring plasmids carrying determinants for antibiotic resistance. Phylogenetic methods are often used to reconstruct evolutionary events but mostly assume that a phylogenetic tree can describe the shared evolutionary history of pathogens. This assumption—that phylogenetic trees accurately represent that history—is challenged when genes move horizontally, necessitating the use of phylogenetic networks instead.

In this talk, I will first present recent work on inferring phylogenetic networks using a Markov chain Monte Carlo approach. This approach models the horizontal movement of genes using coalescent models, allowing us to quantify reassortment, recombination, or plasmid transfer rates. I will then showcase multiple applications of phylogenetic network inference. First, I will demonstrate how we can use the coalescent with reassortment to infer reassortment rates across different influenza viruses. Next, I will discuss how phylogenetic network inference allows us to infer the complex evolutionary history of human coronaviruses, including MERS and SARS-like viruses such as SARS-CoV-1 and 2. Lastly, I will present work on reconstructing the gain and loss of small plasmids and the recent dissemination of a multidrug-resistant plasmid between Shigella sonnei and Shigella flexneri lineages. This includes multiple independent events and steady growth in prevalence since 2010 and quantifies the rates at which different plasmids move between bacterial lineages.
 

Keywords

Phylogenetic networks

Phylodynamics

Bayesian inference

Infectious diseases

MCMC

Genomic epidemiology 

Speaker

Nicola Mueller, University of California, San Francisco

Generalized Estimating Equation for Modeling Cell-Cell Correlation in Single-Cell RNA Seq Data

For analyzing the single-cell RNA sequencing (scRNA-seq) data, it is believed that cells from the same individual share common genetic and environmental backgrounds and are not statistically independent. Many popularly used methods, such as the default wilcox test in FindMarkers function in the Seurat package do not address this dependence issue, leading to potentially highly inflated type 1 error rates. There are more recent works arguing for the use generalized linear mixed models with a random effect for individual, to properly account for the correlation structure among measures from cells within an individual. However, traditional mixed effect model has strong assumptions that require the same and strictly positive correlation across all cells in the same individual. We demonstrate that this can be rather restrictive for real data we see, given the heterogeneous nature of all cells in the same subject. In case of positive correlation assumption violated, classical random effects model demonstrates consistently biased inference and inflated type I error in differential expression analysis we investigated. We propose to use the generalized estimating equation based semi-parametric approach for this issue and demonstrate its robust and efficient performance in both simulation and real data that focuses on revealing common and unique gene expression signatures in primary CD4+ T cells latently infected with HIV under different conditions.  

Keywords

HIV latency

single cell RNA seq 

Co-Author(s)

Tuo Lin, University of Florida
Toni Gui, University of Florida
Nadejda Beliakova-bethell, University of California, San Diego
Xin Tu, University of California San Diego

Speaker

Xinlian Zhang, University of California, San Diego

Incorporating Genomic Sequences into Stochastic Transmission Modeling to Improve the Forecasting of the Spread of SARS-CoV-2

The recent SARS-CoV-2 pandemic has highlighted the growing importance of infectious disease forecasting. An accurate and robust predictive model can empower public health leaders to make timely decisions on isolation and vaccination policies, thereby reducing the number of infections and severe cases. However, the emergence of new variants and subvariants can significantly alter the transmissibility and virulence of the pathogen in a short time, making the number of infections and hospitalizations difficult to predict. To enhance the timeliness and accuracy of forecasting, SARS-CoV-2 sequencing data can be utilized, which is a vast database as millions of sequences have been collected and reported over the past few years. By incorporating the evolution of SARS-CoV-2 virus into classic transmission models, we conclude that genomic data is crucial for capturing trends in epidemiological data when new variants and subvariants emerge, leading to the development of a more reliable forecasting model. 

Keywords

Genomic epidemiology

Phylodynamics

Infectious diseases 

Co-Author

Ira Longini, University of Florida

Speaker

Toni Gui, University of Florida