Characterizing the informativeness of pathogen genome sequence datasets about transmission between population groups

Abstract

Pathogen genome analysis helps characterizing transmission between population groups. The information carried by pathogen sequences comes from the accumulation of mutations within their genomes. This means that the pace at which mutations accumulate should determine the granularity of transmission processes that pathogen sequences can characterize. Here, we investigate how the complex interplay between mutation, transmission, mixing and sampling impacts the power of phylogeographic studies. First, we develop a conceptual probabilistic framework to quantify the ability of pairs of sequences of capturing migration history. This allows us to comprehensively explore the space of possible phylogeographic analyses by explicitly considering the pace at which mutations accumulate and the pace at which migration events occur. Using this framework, we identify a pathogen-intrinsic limit in the mixing scale at which their sequence data remains informative, with faster mutating pathogen enabling finer spatial characterization. Secondly, we perform a simulation study exploring a range of assumptions regarding sequencing intensity. We find that sample size further imposes a limit on the characterization of mixing processes. This work highlights inherent horizons of observability for population mixing processes that depend on the interaction between evolution, transmission, mixing and sampling. Such considerations are important for the design of phylogeographic studies.

Repository organization

This repository is organized in sub-folders as follows:

input/ contains input parameters values and data used to generate analyses. Further information is available on the folder-level README file.
figures/ contains the figures (both from the main text and the supplementary information) associated with the manuscript.
manuscript/ contains the manuscript.
scripts/ contains the code used to analyse the data and reproduce the figures. Further information is available on the folder-level README file.
remaster/ contains the code used to simulate the phylogenies depicted in Figure 1.
remaster-sample-size/ contains the code used to perform the simulation study used for Figure 6 and 7.

Install

The code is written in R and relies on some packages, which can be installed using:

Rscript ./scripts/install_requirements.R "scripts/requirements.txt"

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
figures		figures
input		input
remaster-sample-size		remaster-sample-size
remaster		remaster
results		results
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Characterizing the informativeness of pathogen genome sequence datasets about transmission between population groups

Abstract

Repository organization

Install

About

Uh oh!

Releases

Packages

Languages

License

blab/phylogeo-signal

Folders and files

Latest commit

History

Repository files navigation

Characterizing the informativeness of pathogen genome sequence datasets about transmission between population groups

Abstract

Repository organization

Install

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages