Releases · BigDataBiology/SemiBin

19 Oct 06:33

luispedro

v1.2.0

6905b8f

Version 1.2.0

Big change is adding a new chicken caecum prebuilt model (courtesy of Florian Plaza Oñate), but also better outputs.

Full ChangeLog

Pretrained model from chicken caecum
Output table with basic information on bins (including N50 & L50)
When reclustering is used (default), output the unreclusted bins into a directory called output_prerecluster_bins
Added --verbose flag and silented some of the output when it is not used
Use coloredlogs (if package is available)

Assets 2

27 Sep 07:50

luispedro

v1.1.1

498224a

Version 1.1.1

Completely remove use of atomicwrites package.

Assets 2

26 Sep 11:10

luispedro

v1.1.0

b92180d

Version 1.1.0

User-visible improvements

Support .cram format input (#104)
Support using depth file from Metabat2 (#103)
More flexible specification of prebuilt models (case insensitive, normalize - and _)
Better output message when no bins are produced

Bugfixes

Fix bug using atomicwrite on certain network filesystems (#97)

Internal improvements

Remove torch version restriction (and test on Python 3.10)

Assets 2

03 Aug 15:07

luispedro

v1.0.3

4d1d769

Version 1.0.3

Bugfix release

Fix coverage parsing when value is not an integer (#103)
Fix multi_easy_bin with taxonomy file given on the command line

Full Changelog: v1.0.2...v1.0.3

Assets 2

08 Jul 11:29

luispedro

v1.0.2

ed39a65

Version 1.0.2

Bugfix release

Completely fixes (#93) (see also #101)

Assets 2

09 May 12:08

luispedro

v1.0.1

f7f04d2

Version 1.0.1

Bugfix release (fixes #93)

Assets 2

29 Apr 15:26

luispedro

v1.0.0

cb38934

Version 1.0.0

Released April 29 2022

This coincides with the publication of the manuscript.

User-visible improvements

More balanced file split when calling prodigal in parallel should take better advantage of multiple threads
Fix bug when long stretches of Ns are present (#87)
Better error messages (#90 & #91)

Bugfixes

Fix bugs in training from multiple samples
Fix bug in incorporating CAT results

Full Changelog: v0.7.0...v1.0.0

Assets 2

02 Mar 23:40

luispedro

v0.7.0

d836ec2

Version 0.7.0

Full support for Mac OS X is the big change.

Full ChangeLog:

Improve check_install command by printing out paths and correctly handling optionality of FragGeneScan/prodigal
Reuse markers.hmmout to make the training from several samples faster
Add option --tmpdir to set temporary directory
Substitute FragGeneScan with Prodigal (FragGeneScan can still be used with --orf-finder parameter)
Add 'concatenate_fasta' command to combine fasta files for multi-sample binning

Assets 2

08 Feb 09:24

luispedro

v0.6.0

9d5d5b7

Version 0.6.0

Version 0.6

Released February 7 2022

User-visible improvements

Provide pretrained models from soil, cat gut, human oral,pig gut, mouse gut, built environment, wastewater and global (training from all samples).
Users can now pass in the output of running mmseqs2 directly and SemiBin will use that instead of calling mmseqs itself (use option
--taxonomy-annotation-table).
The subcommand to generate cannot links is now called generate_cannot_links. The old name (predict_taxonomy) is kept as a
deprecated alias.
Similarly, sequence features (k-mer and abundance) are generated using the commands generate_sequence_features_single and
generate_sequence_features_multi (for single- and multi-sample modes, respectively). The old names generate_data_single/generate_data_multi`) are kept as deprecated aliases.
Add check_install command and run check_install before easy command

Bugfixes

Fix bug with non-standard characters in sample names (#68).

New Contributors

@SvetlanaUP made their first contribution in #60

Contributors

SvetlanaUP

Assets 2

07 Jan 17:08

luispedro

v0.5.0

cce7c9a

Version 0.5.0

Version 0.5

Released January 7 2022

User-visible improvements

Reclustering is now the default (use --no-recluster to disable it; the
option --recluster is deprecated and ignored) as the computational costs
are much lower
GTDB lazy downloading is now performed even if a non-standard directory is
used
The CACHEDIR.TAG protocol was implemented
(this is supported by several tools that perform tasks such as backups).

Bugfixes

Fix bug with --min-len (minimal length). Previously, only contigs greater
than the given minimal length were used (instead of greater-equal to the
minimal length).
GTDB downloading was inconsistent in a few instances which have been fixed

Internal improvements

Much more efficient code (including lower memory usage) for binning,
especially if a pretrained model is used. As an example, using a
deeply-sequenced ocean sample, generating the data (generate_data_single
step) goes down from 14 to 9 minutes; while binning (bin step, using
--recluster) goes down from 10m17s (using 20GB of RAM, at peak) to 4m33
(using 4.5 GB, at peak). Thus total time from BAM file to bins went down from
25 to 14 minutes (using 4 threads) and peak RAM is now 4.5GB, making it
usable on a typical laptop.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

User-visible improvements

Bugfixes

Internal improvements

Uh oh!

Uh oh!

Bugfix release

Uh oh!

Uh oh!

User-visible improvements

Bugfixes

Uh oh!

Uh oh!

Version 0.6

User-visible improvements

Bugfixes

New Contributors

Contributors

Uh oh!

Version 0.5

User-visible improvements

Bugfixes

Internal improvements

Uh oh!

Releases: BigDataBiology/SemiBin

Version 1.2.0

Uh oh!

Version 1.1.1

Uh oh!

Version 1.1.0

User-visible improvements

Bugfixes

Internal improvements

Uh oh!

Version 1.0.3

Uh oh!

Version 1.0.2

Bugfix release

Uh oh!

Version 1.0.1

Uh oh!

Version 1.0.0

User-visible improvements

Bugfixes

Uh oh!

Version 0.7.0

Uh oh!

Version 0.6.0

Version 0.6

User-visible improvements

Bugfixes

New Contributors

Contributors

Uh oh!

Version 0.5.0

Version 0.5

User-visible improvements

Bugfixes

Internal improvements

Uh oh!