Skip to content

HUST-NingKang-Lab/DeepTaxa

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DeepTaxa

downloads

DeepTaxa is a powerful deep learning tool designed for taxonomic classification of bacterial genomes, shows an order of magnitude increase in accuracy while maintaining remarkable computational efficiency. Our evaluation results demonstrated that DeepTaxa is highly accurate at different resolutions, could achieve a taxonomic classification accuracy of >99% at the rank of phylum to genus. It can classify 1,000 genomes within an hour wall clock time when using 32 cores of CPU. DeepTaxa also infer taxonomic hints for novel genomes, even when neither family or genus information is available in the model. Therefore, we anticipate that DeepTaxa will serve as a useful instrument to understand present and future microbial diversity in a wide range of microbiological and ecological settings. Moreover, DeepTaxa could serve as a complement for contemporary methods, with the aim of accelerating taxonomic knowledge discovery from the rich microbiome resources.

Requirements

  • Linux operating system
  • At least 16 GB of RAM

Dependency

Installation

We recommend deploying ONN4ARG using conda.

# install via source codes
wget https://github.com/HUST-NingKang-Lab/DeepTaxa/releases/download/v0.3-alpha/DeepTaxa.zip
unzip DeepTaxa.zip
cd DeepTaxa
# create the environment
conda env create -f config/deeptaxa.yml
# activate the environment
conda activate deeptaxa
# add an executable permission for scripts
chmod +x script/*
# check installation
script/check.sh
# If all goes well, you'll see a result file called "pred.tsv" in the "data/" directory in a few minutes

# install via pip
pip install deeptaxa
#If installed via pip, please confirm the "config" and "script" directories are under the "DeepTaxa" directory.

Usage

Before using DeepTaxa, make sure you have activated the deeptaxa environment by using conda activate deeptaxa.

# identify 120 bacterial markers genes with hmmer
script/hmmer.sh [-h|--help] [-i|--input] [-o|--output]
# alignment of marker genes with mmseqs2
script/mmseqs.sh [-h|--help] [-i|--input] [-s|--hsummary] [-t|--tmp] [-o|--output]
# convert alignment results into an array format hdf5 file
script/data.py [-h] [-i MMSEQS_RESULT_PATH] [-c CONFIG_FILE_PATH] [-f FILE] [-o OUTPUT]
# taxonomic classification of genomes
script/predict.py [-h] [-i INPUT] [-m MODEL] [-t TREE] [-O ONTOLOGY] [-f FILE] [-o OUTPUT]

The workflow will take genomes/genomes_name_protein.faa as input, and finally store the predicted annotations in data/pred.tsv or any other path you specify in the -o argument of predict.py.

Developers

Name Email Affiliation
Yuguo Zha hugozha@hust.edu.cn School of Life Science and Technology, Huazhong University of Science & Technology
Haobo Zhang M202272359@hust.edu.cn School of Life Science and Technology, Huazhong University of Science & Technology
Kang Ning ningkang@hust.edu.cn School of Life Science and Technology, Huazhong University of Science & Technology

About

DeepTaxa: a deep learning approach for ultra-fast classification of genomes

Resources

License

Stars

Watchers

Forks

Packages

No packages published