RLS Version 1.5.0 SemiBin2 beta

luispedro · luispedro · commit fc0d22c8bf3e · 2023-01-16T23:39:29.000+01:00
Big change is the addition of a `SemiBin2` script, which is still experimental, but should be a slightly nicer interface. USER-VISIBLE IMPROVEMENTS SINCE v1.4.0 - Added a new option for ORF finding, called `fast-naive` which is an internal very fast implementation. - Added the possibility of bypassing ORF finding altogether by providing prodigal outputs directly (or any other gene prediction in the right format) - Command line argument checking is more exhaustive instead of exiting at first error - Added `--quiet` flag to reduce the amount of output printed - Better `--help` (group required arguments separately) - Add `--output-compression` option to compress outputs - Add `--tag-output` option which allows for control of the output filenames (and also makes the anvi'o compatible — see discussion at [#123](#123). - Add contig->bin mapping table ([#123](#123)) - `SemiBin.main.main1` and `SemiBin.main.main2` can now be called as a function with command line arguments (`main1` corresponds to _SemiBin1_ and `main2` corresponds to _SemiBin2_) ```python import SemiBin.main ... SemiBin.main.main2(['single_easy_bin', '--input-fasta', ...]) ```
diff --git a/ChangeLog b/ChangeLog
@@ -1,17 +1,17 @@
-Unreleased
+Version 1.5.0 (SemiBin2 beta) Jan 17 2023 by BigDataBiology
 	* Add `SemiBin2` script
 	* Added naive ORF finder
-	* Make command line arguments more flexible for --sequencing-type argument
 	* Add `--prodigal-output-faa` argument (#113)
+	* Make command line arguments more flexible for --sequencing-type argument
 	* Argument checking is more exhaustive instead of exiting at first error
 	* Add `--quiet` argument
-	* Better `--help` (group required arguments separately)
 	* Add `--compression` option
-	* Make SemiBin.main.main callable with a list of arguments
 	* Add `--tag-output` option
-	* Add contig->bin mapping table (#123)
+	* Better `--help` (group required arguments separately)
+	* Make SemiBin.main.main2 callable with a list of arguments
+	* Add contig -> bin mapping table (#123)
 
-Version 1.4.0 Dec  2022 by BigDataBiology
+Version 1.4.0 Dec 15 2022 by BigDataBiology
 	* Provide binning algorithm for assemblies from long read
 	* Add `--allow-missing-mmseqs2` flag to `check_install` subcommand
 	* Run Prodigal in multiple jobs without multiprocessing (#106)
diff --git a/SemiBin/semibin_version.py b/SemiBin/semibin_version.py
@@ -1 +1 @@
-__version__ = '1.4.0'
+__version__ = '1.5.0'
diff --git a/docs/semibin2.md b/docs/semibin2.md
@@ -7,13 +7,25 @@ They have the same functionality, but slightly different interfaces. The exact
 interface to `SemiBin2` should be considered as unstable (while we will strive
 to maintain backwards compatibility if you call the `SemiBin` script).
 
-# Differences between SemiBin2 and SemiBin1
+## Upgrading to SemiBin2
+
+1. If you are using the `easy_*` workflows, then they will probably continue to
+   work exactly the same (except that you will get better results faster).
+2. Outputs are now **always** in a directory called `output_bins`.
+3. By default, bins are in file named as `SemiBin_{label}.fa.gz` (and
+   compressed with _gzip_ as the name indicates).
+
+Points `2` and `3` may require some minor modifications to wrapper scripts.
+
+## Longer list of differences between SemiBin2 and SemiBin1
 
 The biggest different is that the default training mode is self-supervised mode.
 
 - Output bins are now **always** in a directory called `output_bins` (in
-- Output filenames are now anvi'o compatible (effectively, the default value of `--tag-output` is `SemiBin`) (see discussion in [#123](https://github.com/BigDataBiology/SemiBin/issues/123))
   _SemiBin1_, it actually depended on which parameters were used)
+- Output filenames are now anvi'o compatible (effectively, the default value of
+  `--tag-output` is `SemiBin`), see discussion at
+  [#123](https://github.com/BigDataBiology/SemiBin/issues/123).
 - `--compression` defaults to `gz` (instead of `none`)
 - ORF finder defaults to the `fast-naive` internal ORF finder
 - `--write-pre-reclustering-bins` is `False` by default
@@ -24,5 +36,6 @@ The biggest different is that the default training mode is self-supervised mode.
 A few arguments that were deprecated before are completely removed:
 - `--recluster`: it did nothing already as reclustering is default
 - `--mode`: Use `--train-from-many`
-- `--training-type`: Use `--semi-supervised` to use semi-supervised learning (although that is also deprecated)
+- `--training-type`: Use `--semi-supervised` to use semi-supervised learning
+  (although that is also deprecated)
 
diff --git a/docs/whatsnew.md b/docs/whatsnew.md
@@ -1,6 +1,11 @@
 # What's New
 
-## Unreleased github version
+## Version 1.5.0 (SemiBin2 beta)
+
+*Released Jan 17, 2023*
+
+Big change is the addition of a `SemiBin2` script, which is still experimental, but should be a slightly nicer interface.
+See [[upgrading to SemiBin2](semibin2)]
 
 ### User-visible improvements
 
@@ -10,7 +15,7 @@
 - Added `--quiet` flag to reduce the amount of output printed
 - Better `--help` (group required arguments separately)
 - Add `--output-compression` option to compress outputs
-- Add `--tag-output` option which allows for control of the output filenames (and also makes the anvi'o compatible)
+- Add `--tag-output` option which allows for control of the output filenames (and also makes the anvi'o compatible — see discussion at [#123](https://github.com/BigDataBiology/SemiBin/issues/123).
 - Add contig->bin mapping table ([#123](https://github.com/BigDataBiology/SemiBin/issues/123))
 - `SemiBin.main.main1` and `SemiBin.main.main2` can now be called as a function with command line arguments (`main1` corresponds to _SemiBin1_ and `main2` corresponds to _SemiBin2_)
 

Original file line number	Diff line number	Diff line change
`@@ -1 +1 @@`
`1`		`-__version__ = '1.4.0'`
	`1`	`+__version__ = '1.5.0'`