Biological causes and impacts of rugged tree landscapes in phylodynamic inference

Jiansi Gao Co-Author
 
Andrew Magee Co-Author
 
Luiz Carvalho Co-Author
Getulio Vargas Foundation
 
Marius Brusselmans Co-Author
KU Leuven
 
Marc Suchard Co-Author
University of California-Los Angeles
 
Guy Baele Co-Author
KU Leuven
 
Frederick Matsen Co-Author
Fred Hutchinson Cancer Research Center
 
Jiansi Gao Speaker
Fred Hutch Cancer Center
 
Monday, Aug 4: 8:55 AM - 9:15 AM
Topic-Contributed Paper Session 
Music City Center 

Description

Phylodynamic analysis has been instrumental in elucidating the spread and evolution dynamics of pathogens and cells. The Bayesian approach to phylodynamics integrates out phylogenetic uncertainty, which is typically substantial in phylodynamic datasets due to low genetic diversity. Bayesian phylodynamic analysis does not, however, scale with modern datasets, partly due to difficulties in traversing tree space. Here, we set out to characterize phylodynamic tree space and assess its impacts on analysis difficulty and key biological inferences. By running extensive Bayesian analyses of 15 classic large phylodynamic datasets and carefully analyzing the posteriors, we find that the posterior landscape in tree space ("tree landscape") is diffuse yet rugged, leading to widespread tree sampling problems that usually stem from a small part of the tree. We develop clade-specific diagnostics to show that a few sequences---including putative recombinants and recurrent mutants---frequently drive the ruggedness and sampling problems, although existing data-quality tests show limited power to detect such sequences. The sampling problems can significantly impact phylodynamic inferences or even distort major biological conclusions; the impact is usually stronger on "local" estimates (e.g., introduction history of a focal clade) than the "global"' parameters (e.g., demographic trajectory) that are governed by the general tree shape. In addition, we demonstrate that heterochronous sampling dates contain considerable information about tree topology, which can be in conflict with genetic data at local scale, leading to further complexity in the tree space and systematic discrepancies between Bayesian and the commonly used stepwise phylodynamic approaches. We evaluate existing and newly-developed MCMC diagnostics, and offer strategies for optimizing MCMC settings and mitigating impacts of the sampling problems. Our findings highlight the need for and directions to develop efficient traversal over the rugged tree landscape, ultimately advancing scalable and reliable phylodynamics.

Keywords

Bayesian phylodynamics

phylogenetic inference

Markov chain Monte Carlo

viral evolution

heterochronous sequences

single-cell sequencing