Blast2Tree

An experimental Linux pipeline optimized for haploid fungi, enabling rapid genus-to-species-level identification of multiple genomes with uncertain classification at a user-defined taxonomic level. Additionally, it extracts sequences of interest for manual review. Sequences should be single-copy and taxonomically informative.

Requires:

Working conda or miniconda installation miniconda (to make sure it's up to date, do conda update -n base --all)
A fasta file (.fa) with your reference markers (headers in default NCBI format) for each of the known species across your chosen classification level (e.g. ITS.fa)
A fasta file (.fa) containing a single sequence best representing the reference marker, if you are not sure, you can use the best hit marker after blast and extraction (e.g. reference.fa)
Assembled genomes in the .fasta or .fna format (e.g. isolate_100.fasta)

Utilizes:

How to install

Download
Install the conda environment with

conda env create -f /path/to/download/blast2tree_environment.yml

Set the script blast2tree to path do

pwd

to get the directory, followed by

 echo 'export PATH="$PATH:/path/to/script/dir"' >> ~/.bashrc && source ~/.bashrc

and then

chmod +x /path/to/blast2tree

To run:

Add your genome files (either .fasta or .fna) to a folder containing a file for the reference (.fa) and the markers (.fa)
Then do

conda activate Blast2Tree

To get the help menu, do

blast2tree -h

To view your phylogenetic tree, activate the Blast2Tree conda environment and do

figtree

After which, your results can be found in the .treefile in the _Out file

Processing parameters

Threads|-t

Default = 2

Working directory|--wd

Uses your current directory as the expected working directory.

Run name|--s

Run name and corresponding logfile output identifier.

--MARKER_NAME

Name of your gene marker, e.g. ITS or BT

--Input_seq

This fasta file contains the reference sequences at your specific taxonomic level. e.g. ITS.fa

--CutValue

This value is the minimum length you are willing to compare the genes you specified after extraction. Sequences above this Cutvalue will not be reconstructed. Therefore, knowing your expected sequence size (65% is a good starting point) is important, as the greater the length of the sequence, the more resolution you will be able to achieve.

--THRESHOLD

This is the minimum length required for final processing to ensure quality through higher-length sequences. Sequences that are less than this value are removed from the final analysis (tree making process) and are moved to the leftovers.fasta file

Analysis functions

Pre-align & trim|--Z

Standardises reference markers before using them in blast search and downstream processing

Build|--A

Creates blastdb for each genome and does a blast search against your genomes using your provided reference markers (e.g. ITS.fa). Thereafter, it extracts the relevant hit sequences.

Extract|--B

This determines the longest hit from your blast search, and extracts it, and any other shorter sequences related to the relative marker that produced a hit. After extraction, determine the marker that had the best hit for your data and add it to a file called reference.fa with a unique header, e.g. >best_ref

Reconstruct|--C

If sequences are below the --THRESHOLD value, this script attempts to reconstruct these markers through both overlapping and non-overlapping sequences from separate contigs, regardless of directionality, to improve their length. In addition, it filters the relevant hits in preparation for --tree.

Tree|--D

This does alignment, trimming, and construction of a standard phylogenetic tree.

Utility functions

Rename contigs|--K

Renames all the .fasta files' contigs in a directory, based on the filename(s). Output is in the directory renamed_contigs.

Make files|--M

Makes a folder for all .fasta's in a directory based on their names and moves them into their corresponding folder.

How to uninstall

To remove enviroment conda remove -n Blast2Tree --all
To remove the pathing nano ~/.bashrc
To remove the downloaded program, e.g. rm -rf /path/to/blast2tree-v0.0.1

Disclaimer

This version of the code is still being developed.

Name		Name	Last commit message	Last commit date
Latest commit History 295 Commits
config		config
lib		lib
misc		misc
python_scripts		python_scripts
.gitattributes		.gitattributes
Dockerfile		Dockerfile
README.md		README.md
blast2tree		blast2tree
blast2tree_environment.yml		blast2tree_environment.yml
docker-compose.yaml		docker-compose.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Blast2Tree

How to install

Processing parameters

Analysis functions

Utility functions

How to uninstall

Disclaimer

About

Uh oh!

Releases 2

Packages

Contributors 2

Uh oh!

Languages

CallinCeriani/blast2tree

Folders and files

Latest commit

History

Repository files navigation

Blast2Tree

How to install

Processing parameters

Analysis functions

Utility functions

How to uninstall

Disclaimer

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 2

Uh oh!

Languages

Packages