HiFiBGC is a tool for detecting Biosynthetic Gene Clusters (BGCs) in PacBio HiFi metagenomic data.
Choose one of the following installation methods:
mamba create -n hifibgc -c conda-forge -c bioconda -c amityadav -y hifibgc
mamba activate hifibgc
mamba is preferred over below conda as it takes much lesser time and consumes lesser memory (RAM).
mamba can be installed from here.
conda create -n hifibgc -c conda-forge -c bioconda -c amityadav -y hifibgc
conda activate hifibgc
Requires Python 3.11, pip, and mamba:
pip install hifibgc
Note: HiFiBGC can run without mamba, but then every command need to be run with --conda-frontend conda
(e.g., hifibgc test --conda-frontend conda
).
Run once to install the required database and tool:
hifibgc install
Verify the installation using the test dataset:
hifibgc test
Successful completion will display "Snakemake finished successfully" and create an output directory hifibgc1.out
.
OS Compatibility: Installation has been tested on Ubuntu 20.04 (Linux) and macOS-12. On macOS-12, installation was verified using conda and pip, but not mamba.
Run once to install the required database and tool (note: if you have already run this earlier for testing installation, then skip this step):
hifibgc install
To run HiFiBGC on your own data, use the following command:
hifibgc run --input input.fastq
By default, HiFiBGC outputs to a directory named hifibgc1.out
, you can specify a custom name using --output
option as shown below. The count of threads can be specified via --threads
option (the default is 80).
hifibgc run --input input.fastq --output outdir --threads 50
For all available options, see Run command help.
The output directory structure:
hifibgc1.out/
├── 01_assembly/ # Output from three assemblers
├── 02_mapping_reads_to_merged_assembly/ # Read mapping to concatenated assembly and unmapped read extraction
├── 03_antismash/ # BGC prediction
├── 04_bgc_clustering/ # BGC clustering
├── 05_final_output/ # Primary HiFiBGC output
├── benchmarks/ # Resource usage and timing information for different components of HiFiBGC
├── config.yaml # Configuration file for HiFiBGC run
├── hifibgc.log # Snakemake log file
└── logs/ # Tool-specific log files
Among above, the folder 05_final_output
contains primary output of HiFiBGC with below directory structure.
05_final_output/ # Primary HiFiBGC output
├── BGC_all/ # All BGC .gbk files
├── BGC_all_metadata.tsv # Metadata for all BGCs
├── BGC_representative/ # Representative BGC .gbk files
└── upsetplot/ # Upsetplot comparison of results from three assemblers and unmapped reads
$ hifibgc --help
Usage: hifibgc [OPTIONS] COMMAND [ARGS]...
Detect Biosynthetic Gene Clusters (BGCs) in HiFi metagenomic data. For
more options, run: hifibgc command --help
Options:
-v, --version Show the version and exit.
-h, --help Show this message and exit.
Commands:
run Run HiFiBGC
install Install required database and tool
test Test HiFiBGC
config Copy the system default config file
citation Print the citation(s) for this tool
$ hifibgc run --help
Usage: hifibgc run [OPTIONS] [SNAKE_ARGS]...
Run HiFiBGC
Options:
--input TEXT Input file [required]
--output PATH Output directory [default: hifibgc1.out]
--bigscape_cutoff FLOAT BiG-SCAPE cutoff parameter [default: 0.3]
--configfile TEXT Custom config file [default:
(outputDir)/config.yaml]
--threads INTEGER Number of threads to use [default: 80]
--use-conda / --no-use-conda Use conda for Snakemake rules [default: use-
conda]
--conda-prefix PATH Custom conda env directory
--snake-default TEXT Customise Snakemake runtime args [default:
--rerun-incomplete, --printshellcmds,
--nolock, --show-failed-logs]
-h, --help Show this message and exit.
For detailed usage of other commands, use hifibgc <command> --help
.
HiFiBGC utilizes following tools:
- hifiasm-meta
- metaFlye
- HiCanu
- Minimap2
- SAMtools
- antiSMASH
- BiG-SCAPE
- complex-upsetplot
- Snaketool
- Snaketool-utils
Yadav, A., Subramanian, S. HiFiBGC: an ensemble approach for improved biosynthetic gene cluster detection in PacBio HiFi-read metagenomes. BMC Genomics 25, 1096 (2024). https://doi.org/10.1186/s12864-024-10950-7