Msmithii_molecular_clocking

Code available for Marsha et al. 2021 - "Reconstruction of ancient microbial genomes from the human gut"

Optimizing molecular clocking analysis by inspecting sites affected by ancient DNA damage and strain heterogeneity

Requirements:

Python >= 3.6
biopython
pysam >= 0.15.4

Detecting artificial mutations in the MSA from ancient DNA damage effects

ancient_artefacts_detector.py artefacts_detecting_example_msa.fna \
--output_CT CT_artefacts.txt \
--output_GA GA_artefacts.txt \

Where: <artefacts_detecting_example_msa.fna> The initial MSA for molecular clocking analysis <CT_artefacts.txt> Detected artificial sites of C/T substitution. <GA_artefacts.txt> Detected artificial sites of G/A substitution.

Outputs:

two one-column text files (for CT and GA artificial substitutions respectively) containing detected sites' coordinates in the MSA file. Please check example_data

usage: ancient_artefacts_detector.py [-h] [-opt_CT OUTPUT_CT]
                                     [-opt_GA OUTPUT_GA]
                                     [-a_ratio ANCIENT_RATIO]
                                     [-m_ratio MODERN_RATIO]
                                     [input_msa]

positional arguments:
  input_msa             Input a multiple sequence alignment in fasta form.

optional arguments:
  -h, --help            show this help message and exit
  -opt_CT OUTPUT_CT, --output_CT OUTPUT_CT
                        Specify the output file name for suspicious CT
                        artefacts.
  -opt_GA OUTPUT_GA, --output_GA OUTPUT_GA
                        Specify the output file name for suspicious GA
                        artefacts.
  -a_ratio ANCIENT_RATIO, --ancient_ratio ANCIENT_RATIO
                        The ratio of T or A to other nucleotides among ancient
                        sequences. From 0 to 1.
  -m_ratio MODERN_RATIO, --modern_ratio MODERN_RATIO
                        The ratio of C or G to other nucleotides among modern
                        sequences. From 0 to 1.

Detecting sites in the MSA from multiple strains

strain_heterogeneity_detector.py \
strain_heterogeneity_detecting_example.fna \
strain_heterogeneity_detecting_example.bam \
m__ppa3_SchirmerM_2016__G88704__bin.11 > heterogeneity_sites.txt

Where: <strain_heterogeneity_detecting_example.fna> A FAST file containing one single continuous nucleotide sequence extracted from MSA <strain_heterogeneity_detecting_example.bam> A BAM file containing reads-contig alignment by mapping metagenomic reads against the extracted sequence from MSA. <m__ppa3_SchirmerM_2016__G88704__bin.11> The header name of the contig in <strain_heterogeneity_detecting_example.fna>

Output:

A four-column text file: First column indicates the detected sites' coordinate in the MSA; Second column gives the dominant base; Third column is the possible alternatives based on mapping analysis; Fourth column is the dominance alllel rate for the detected sites. Note: here just reported sites < 0.8 dominance rate. Please check example_results

usage: strain_heterogeneity_detector.py [-h] [mag] [bam] [contigs]

positional arguments:
  mag         The mag or genome in fasta file.
  bam         reads to mag alignment bam file.
  contigs     Specify the header of reference used in reads-contig mapping.

optional arguments:
  -h, --help  show this help message and exit

Note: all example data and respective results can be found in folder example_data and example_results

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Msmithii_molecular_clocking

Optimizing molecular clocking analysis by inspecting sites affected by ancient DNA damage and strain heterogeneity

Requirements:

Detecting artificial mutations in the MSA from ancient DNA damage effects

Detecting sites in the MSA from multiple strains

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
example_data		example_data
example_results		example_results
README.md		README.md
ancient_artefacts_detector.py		ancient_artefacts_detector.py
strain_heterogeneity_detector.py		strain_heterogeneity_detector.py

SegataLab/Msmithii_molecular_clocking

Folders and files

Latest commit

History

Repository files navigation

Msmithii_molecular_clocking

Optimizing molecular clocking analysis by inspecting sites affected by ancient DNA damage and strain heterogeneity

Requirements:

Detecting artificial mutations in the MSA from ancient DNA damage effects

Detecting sites in the MSA from multiple strains

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages