Skip to content

Releases: BigDataBiology/SemiBin

Version 1.2.0

19 Oct 06:33
Compare
Choose a tag to compare

Big change is adding a new chicken caecum prebuilt model (courtesy of Florian Plaza Oñate), but also better outputs.

Full ChangeLog

  • Pretrained model from chicken caecum
  • Output table with basic information on bins (including N50 & L50)
  • When reclustering is used (default), output the unreclusted bins into a directory called output_prerecluster_bins
  • Added --verbose flag and silented some of the output when it is not used
  • Use coloredlogs (if package is available)

Version 1.1.1

27 Sep 07:50
Compare
Choose a tag to compare

Completely remove use of atomicwrites package.

Version 1.1.0

26 Sep 11:10
Compare
Choose a tag to compare

User-visible improvements

  • Support .cram format input (#104)
  • Support using depth file from Metabat2 (#103)
  • More flexible specification of prebuilt models (case insensitive, normalize - and _)
  • Better output message when no bins are produced

Bugfixes

  • Fix bug using atomicwrite on certain network filesystems (#97)

Internal improvements

  • Remove torch version restriction (and test on Python 3.10)

Version 1.0.3

03 Aug 15:07
Compare
Choose a tag to compare

Bugfix release

  • Fix coverage parsing when value is not an integer (#103)
  • Fix multi_easy_bin with taxonomy file given on the command line

Full Changelog: v1.0.2...v1.0.3

Version 1.0.2

08 Jul 11:29
Compare
Choose a tag to compare

Bugfix release

Completely fixes (#93) (see also #101)

Version 1.0.1

09 May 12:08
Compare
Choose a tag to compare

Bugfix release (fixes #93)

Version 1.0.0

29 Apr 15:26
v1.0.0
Compare
Choose a tag to compare

Released April 29 2022

This coincides with the publication of the manuscript.

User-visible improvements

  • More balanced file split when calling prodigal in parallel should take better advantage of multiple threads
  • Fix bug when long stretches of Ns are present (#87)
  • Better error messages (#90 & #91)

Bugfixes

  • Fix bugs in training from multiple samples
  • Fix bug in incorporating CAT results

Full Changelog: v0.7.0...v1.0.0

Version 0.7.0

02 Mar 23:40
Compare
Choose a tag to compare

Full support for Mac OS X is the big change.

Full ChangeLog:

  • Improve check_install command by printing out paths and correctly handling optionality of FragGeneScan/prodigal
  • Reuse markers.hmmout to make the training from several samples faster
  • Add option --tmpdir to set temporary directory
  • Substitute FragGeneScan with Prodigal (FragGeneScan can still be used with --orf-finder parameter)
  • Add 'concatenate_fasta' command to combine fasta files for multi-sample binning

Version 0.6.0

08 Feb 09:24
v0.6.0
Compare
Choose a tag to compare

Version 0.6

Released February 7 2022

User-visible improvements

  • Provide pretrained models from soil, cat gut, human oral,pig gut, mouse gut, built environment, wastewater and global (training from all samples).
  • Users can now pass in the output of running mmseqs2 directly and SemiBin will use that instead of calling mmseqs itself (use option
    --taxonomy-annotation-table).
  • The subcommand to generate cannot links is now called generate_cannot_links. The old name (predict_taxonomy) is kept as a
    deprecated alias.
  • Similarly, sequence features (k-mer and abundance) are generated using the commands generate_sequence_features_single and
    generate_sequence_features_multi (for single- and multi-sample modes, respectively). The old names generate_data_single/generate_data_multi`) are kept as deprecated aliases.
  • Add check_install command and run check_install before easy command

Bugfixes

  • Fix bug with non-standard characters in sample names (#68).

New Contributors

Version 0.5.0

07 Jan 17:08
v0.5.0
Compare
Choose a tag to compare

Version 0.5

Released January 7 2022

User-visible improvements

  • Reclustering is now the default (use --no-recluster to disable it; the
    option --recluster is deprecated and ignored) as the computational costs
    are much lower
  • GTDB lazy downloading is now performed even if a non-standard directory is
    used
  • The CACHEDIR.TAG protocol was implemented
    (this is supported by several tools that perform tasks such as backups).

Bugfixes

  • Fix bug with --min-len (minimal length). Previously, only contigs greater
    than the given minimal length were used (instead of greater-equal to the
    minimal length).
  • GTDB downloading was inconsistent in a few instances which have been fixed

Internal improvements

  • Much more efficient code (including lower memory usage) for binning,
    especially if a pretrained model is used. As an example, using a
    deeply-sequenced ocean sample, generating the data (generate_data_single
    step) goes down from 14 to 9 minutes; while binning (bin step, using
    --recluster) goes down from 10m17s (using 20GB of RAM, at peak) to 4m33
    (using 4.5 GB, at peak). Thus total time from BAM file to bins went down from
    25 to 14 minutes (using 4 threads) and peak RAM is now 4.5GB, making it
    usable on a typical laptop.