Releases: BigDataBiology/SemiBin
Releases · BigDataBiology/SemiBin
Version 1.2.0
Big change is adding a new chicken caecum prebuilt model (courtesy of Florian Plaza Oñate), but also better outputs.
Full ChangeLog
- Pretrained model from chicken caecum
- Output table with basic information on bins (including N50 & L50)
- When reclustering is used (default), output the unreclusted bins into a directory called
output_prerecluster_bins
- Added
--verbose
flag and silented some of the output when it is not used - Use coloredlogs (if package is available)
Version 1.1.1
Completely remove use of atomicwrites
package.
Version 1.1.0
User-visible improvements
- Support .cram format input (#104)
- Support using depth file from Metabat2 (#103)
- More flexible specification of prebuilt models (case insensitive, normalize
-
and_
) - Better output message when no bins are produced
Bugfixes
- Fix bug using
atomicwrite
on certain network filesystems (#97)
Internal improvements
- Remove torch version restriction (and test on Python 3.10)
Version 1.0.3
Bugfix release
- Fix coverage parsing when value is not an integer (#103)
- Fix
multi_easy_bin
with taxonomy file given on the command line
Full Changelog: v1.0.2...v1.0.3
Version 1.0.2
Version 1.0.1
Bugfix release (fixes #93)
Version 1.0.0
Released April 29 2022
This coincides with the publication of the manuscript.
User-visible improvements
- More balanced file split when calling prodigal in parallel should take better advantage of multiple threads
- Fix bug when long stretches of Ns are present (#87)
- Better error messages (#90 & #91)
Bugfixes
- Fix bugs in training from multiple samples
- Fix bug in incorporating CAT results
Full Changelog: v0.7.0...v1.0.0
Version 0.7.0
Full support for Mac OS X is the big change.
Full ChangeLog:
- Improve
check_install
command by printing out paths and correctly handling optionality of FragGeneScan/prodigal - Reuse markers.hmmout to make the training from several samples faster
- Add option
--tmpdir
to set temporary directory - Substitute FragGeneScan with Prodigal (FragGeneScan can still be used with
--orf-finder
parameter) - Add 'concatenate_fasta' command to combine fasta files for multi-sample binning
Version 0.6.0
Version 0.6
Released February 7 2022
User-visible improvements
- Provide pretrained models from soil, cat gut, human oral,pig gut, mouse gut, built environment, wastewater and global (training from all samples).
- Users can now pass in the output of running mmseqs2 directly and SemiBin will use that instead of calling mmseqs itself (use option
--taxonomy-annotation-table
). - The subcommand to generate cannot links is now called
generate_cannot_links
. The old name (predict_taxonomy
) is kept as a
deprecated alias. - Similarly, sequence features (k-mer and abundance) are generated using the commands
generate_sequence_features_single
and
generate_sequence_features_multi
(for single- and multi-sample modes, respectively). The old names generate_data_single/
generate_data_multi`) are kept as deprecated aliases. - Add
check_install
command and runcheck_install
before easy command
Bugfixes
- Fix bug with non-standard characters in sample names (#68).
New Contributors
- @SvetlanaUP made their first contribution in #60
Version 0.5.0
Version 0.5
Released January 7 2022
User-visible improvements
- Reclustering is now the default (use
--no-recluster
to disable it; the
option--recluster
is deprecated and ignored) as the computational costs
are much lower - GTDB lazy downloading is now performed even if a non-standard directory is
used - The CACHEDIR.TAG protocol was implemented
(this is supported by several tools that perform tasks such as backups).
Bugfixes
- Fix bug with
--min-len
(minimal length). Previously, only contigs greater
than the given minimal length were used (instead of greater-equal to the
minimal length). - GTDB downloading was inconsistent in a few instances which have been fixed
Internal improvements
- Much more efficient code (including lower memory usage) for binning,
especially if a pretrained model is used. As an example, using a
deeply-sequenced ocean sample, generating the data (generate_data_single
step) goes down from 14 to 9 minutes; while binning (bin
step, using
--recluster
) goes down from 10m17s (using 20GB of RAM, at peak) to 4m33
(using 4.5 GB, at peak). Thus total time from BAM file to bins went down from
25 to 14 minutes (using 4 threads) and peak RAM is now 4.5GB, making it
usable on a typical laptop.