DeepTaxa

DeepTaxa is a powerful deep learning tool designed for taxonomic classification of bacterial genomes, shows an order of magnitude increase in accuracy while maintaining remarkable computational efficiency. Our evaluation results demonstrated that DeepTaxa is highly accurate at different resolutions, could achieve a taxonomic classification accuracy of >99% at the rank of phylum to genus. It can classify 1,000 genomes within an hour wall clock time when using 32 cores of CPU. DeepTaxa also infer taxonomic hints for novel genomes, even when neither family or genus information is available in the model. Therefore, we anticipate that DeepTaxa will serve as a useful instrument to understand present and future microbial diversity in a wide range of microbiological and ecological settings. Moreover, DeepTaxa could serve as a complement for contemporary methods, with the aim of accelerating taxonomic knowledge discovery from the rich microbiome resources.

Requirements

Linux operating system
At least 16 GB of RAM

Dependency

Installation

We recommend deploying ONN4ARG using conda.

# install via source codes
wget https://github.com/HUST-NingKang-Lab/DeepTaxa/releases/download/v0.3-alpha/DeepTaxa.zip
unzip DeepTaxa.zip
cd DeepTaxa
# create the environment
conda env create -f config/deeptaxa.yml
# activate the environment
conda activate deeptaxa
# add an executable permission for scripts
chmod +x script/*
# check installation
script/check.sh
# If all goes well, you'll see a result file called "pred.tsv" in the "data/" directory in a few minutes

# install via pip
pip install deeptaxa
#If installed via pip, please confirm the "config" and "script" directories are under the "DeepTaxa" directory.

Usage

Before using DeepTaxa, make sure you have activated the deeptaxa environment by using conda activate deeptaxa.

# identify 120 bacterial markers genes with hmmer
script/hmmer.sh [-h|--help] [-i|--input] [-o|--output]
# alignment of marker genes with mmseqs2
script/mmseqs.sh [-h|--help] [-i|--input] [-s|--hsummary] [-t|--tmp] [-o|--output]
# convert alignment results into an array format hdf5 file
script/data.py [-h] [-i MMSEQS_RESULT_PATH] [-c CONFIG_FILE_PATH] [-f FILE] [-o OUTPUT]
# taxonomic classification of genomes
script/predict.py [-h] [-i INPUT] [-m MODEL] [-t TREE] [-O ONTOLOGY] [-f FILE] [-o OUTPUT]

The workflow will take genomes/genomes_name_protein.faa as input, and finally store the predicted annotations in data/pred.tsv or any other path you specify in the -o argument of predict.py.

Developers

Name	Email	Affiliation
Yuguo Zha	hugozha@hust.edu.cn	School of Life Science and Technology, Huazhong University of Science & Technology
Haobo Zhang	M202272359@hust.edu.cn	School of Life Science and Technology, Huazhong University of Science & Technology
Kang Ning	ningkang@hust.edu.cn	School of Life Science and Technology, Huazhong University of Science & Technology

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DeepTaxa

Requirements

Dependency

Installation

Usage

Developers

About

Uh oh!

Releases 3

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
genomes		genomes
script		script
LICENSE		LICENSE
README.md		README.md
deeptaxa.yml		deeptaxa.yml

License

HUST-NingKang-Lab/DeepTaxa

Folders and files

Latest commit

History

Repository files navigation

DeepTaxa

Requirements

Dependency

Installation

Usage

Developers

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Languages

Packages