Monday, Aug 4: 10:30 AM - 12:20 PM
0394
Invited Paper Session
Music City Center
Room: CC-207C
Applied
Yes
Main Sponsor
Section on Statistics in Epidemiology
Co Sponsors
Health Policy Statistics Section
Section on Statistics in Genomics and Genetics
Presentations
Relating genomic data to transmission patterns is an ongoing challenge. In many practical applications, researchers focus on clustering: how many of the sequences in a dataset are "clustered with" any other sequence? Clustering is interpreted as a sign of recent transmission. However, clustering is a very simple measure. It depends on the clustering threshold, is a binary variable, and it typically does not account for much evolutionary complexity. Here, we explore how well the local branching index (LBI) can capture patterns of transmission. LBI is a measure of the rate of branching in a phylogenetic tree that is specific to each tip. Tips with very rapid nearby branching are likely to be clustered, as their sequence is likely close to other sequences. LBI can therefore be seen as a continuous variable that should contain some of the same information as clustering. We use simulations and analysis of simulated phylogenetic trees to explore the relationships among LBI, transmission patterns, and sampling.
Keywords
Infectious diseases
Phylogenetic tree
The horizontal movement of genes is a crucial driver in the evolution of viral and bacterial pathogens. It enables pathogens to, for example, make large jumps in fitness space, adapt to new host species, or gain novel genes, such as acquiring plasmids carrying determinants for antibiotic resistance. Phylogenetic methods are often used to reconstruct evolutionary events but mostly assume that a phylogenetic tree can describe the shared evolutionary history of pathogens. This assumption—that phylogenetic trees accurately represent that history—is challenged when genes move horizontally, necessitating the use of phylogenetic networks instead.
In this talk, I will first present recent work on inferring phylogenetic networks using a Markov chain Monte Carlo approach. This approach models the horizontal movement of genes using coalescent models, allowing us to quantify reassortment, recombination, or plasmid transfer rates. I will then showcase multiple applications of phylogenetic network inference. First, I will demonstrate how we can use the coalescent with reassortment to infer reassortment rates across different influenza viruses. Next, I will discuss how phylogenetic network inference allows us to infer the complex evolutionary history of human coronaviruses, including MERS and SARS-like viruses such as SARS-CoV-1 and 2. Lastly, I will present work on reconstructing the gain and loss of small plasmids and the recent dissemination of a multidrug-resistant plasmid between Shigella sonnei and Shigella flexneri lineages. This includes multiple independent events and steady growth in prevalence since 2010 and quantifies the rates at which different plasmids move between bacterial lineages.
Keywords
Phylogenetic networks
Phylodynamics
Bayesian inference
Infectious diseases
MCMC
Genomic epidemiology
For analyzing the single-cell RNA sequencing (scRNA-seq) data, it is believed that cells from the same individual share common genetic and environmental backgrounds and are not statistically independent. Many popularly used methods, such as the default wilcox test in FindMarkers function in the Seurat package do not address this dependence issue, leading to potentially highly inflated type 1 error rates. There are more recent works arguing for the use generalized linear mixed models with a random effect for individual, to properly account for the correlation structure among measures from cells within an individual. However, traditional mixed effect model has strong assumptions that require the same and strictly positive correlation across all cells in the same individual. We demonstrate that this can be rather restrictive for real data we see, given the heterogeneous nature of all cells in the same subject. In case of positive correlation assumption violated, classical random effects model demonstrates consistently biased inference and inflated type I error in differential expression analysis we investigated. We propose to use the generalized estimating equation based semi-parametric approach for this issue and demonstrate its robust and efficient performance in both simulation and real data that focuses on revealing common and unique gene expression signatures in primary CD4+ T cells latently infected with HIV under different conditions.
Keywords
HIV latency
single cell RNA seq
The recent SARS-CoV-2 pandemic has highlighted the growing importance of infectious disease forecasting. An accurate and robust predictive model can empower public health leaders to make timely decisions on isolation and vaccination policies, thereby reducing the number of infections and severe cases. However, the emergence of new variants and subvariants can significantly alter the transmissibility and virulence of the pathogen in a short time, making the number of infections and hospitalizations difficult to predict. To enhance the timeliness and accuracy of forecasting, SARS-CoV-2 sequencing data can be utilized, which is a vast database as millions of sequences have been collected and reported over the past few years. By incorporating the evolution of SARS-CoV-2 virus into classic transmission models, we conclude that genomic data is crucial for capturing trends in epidemiological data when new variants and subvariants emerge, leading to the development of a more reliable forecasting model.
Keywords
Genomic epidemiology
Phylodynamics
Infectious diseases