Skip to content

Commit 678fa50

Browse files
committed
ENH Rename "whole" -> "global"
This matches the manuscript
1 parent 0b02ae6 commit 678fa50

File tree

8 files changed

+32
-48
lines changed

8 files changed

+32
-48
lines changed

ChangeLog

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,8 @@ Unreleased
44
mmseqs format
55
* Fix bug with non-standard characters in sample names (#68)
66
* Provide pretrained models from soil, cat gut, human oral,
7-
pig gut, mouse gut, built environment, wastewater and whole(training from all samples).
7+
pig gut, mouse gut, built environment, wastewater and global (training
8+
from all environments).
89

910
Version 0.5.0 Fri Jan 7 2022 by BigDataBiology
1011
* Faster `SemiBin --version`

README.md

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -107,7 +107,9 @@ In this example, SemiBin will download GTDB to
107107
`$HOME/.cache/SemiBin/mmseqs2-GTDB/GTDB`. You can change this default using the
108108
`-r` argument.
109109

110-
You can use `--environment` with (`human_gut`, `dog_gut`, `ocean`, `soil`, `cat_gut`, `human_oral`, `mouse_gut`, `pig_gut`, `built_environment`, `wastewater` and `whole`) to use one of our built-in models. (_NOTE_: This is a recommended way, which will save much time for contig annotations and model training, and also get very good results.)
110+
You can use `--environment` with `human_gut`, `dog_gut`, `ocean`, `soil`,
111+
`cat_gut`, `human_oral`, `mouse_gut`, `pig_gut`, `built_environment`,
112+
`wastewater`, or `global` to use one of our built-in models.
111113

112114
```bash
113115
SemiBin single_easy_bin -i contig_S1.fna -b S1.bam -o output --environment human_gut
@@ -183,11 +185,12 @@ includes the following steps: `generate_cannot_links`, `generate_data_multi` and
183185

184186
In advanced mode, you can also use our built-in pre-trained model in
185187
single-sample binning mode. Here we provide pre-trained models for human gut,
186-
dog gut, marine, soil, cat gut, human oral, mouse gut, pig gut, built_environment, wastewater and whole (train from whole environments) environment. You can just use these models for single-sample
187-
binning and it will save much time for contig annotations and model training.
188+
dog gut, marine, soil, cat gut, human oral, mouse gut, pig gut,
189+
built_environment, wastewater and global (train from all environments listed)
190+
environment.
188191

189192
A very easy way to run SemiBin with a built-in model
190-
(`human_gut`/`dog_gut`/`ocean`/`soil`/`cat_gut`/`human_oral`/`mouse_gut`/`pig_gut`/`built_environment`/`wastewater`/`whole` for single-sample binning):
193+
(`human_gut`/`dog_gut`/`ocean`/`soil`/`cat_gut`/`human_oral`/`mouse_gut`/`pig_gut`/`built_environment`/`wastewater`/`global` for single-sample binning):
191194

192195
```bash
193196
SemiBin single_easy_bin -i contig_S1.fna -b S1.bam -o output --environment human_gut
File renamed without changes.

SemiBin/main.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -175,7 +175,7 @@ def parse_args(args):
175175
for p in [single_easy_bin, binning]:
176176
p.add_argument('--environment',
177177
required=False,
178-
help='Environment for the built-in model (available choices: human_gut/dog_gut/ocean/soil/cat_gut/human_oral/mouse_gut/pig_gut/built_environment/wastewater/whole).',
178+
help='Environment for the built-in model (available choices: human_gut/dog_gut/ocean/soil/cat_gut/human_oral/mouse_gut/pig_gut/built_environment/wastewater/global).',
179179
dest='environment',
180180
default=None,
181181
)

SemiBin/utils.py

Lines changed: 14 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -368,39 +368,20 @@ def split_data(data, sample, separator, is_combined = True):
368368
return part_data
369369

370370
def get_model_path(env):
371-
if env == 'human_gut':
372-
model_path = os.path.join(os.path.split(__file__)[0], 'human_gut_model.h5')
373-
return model_path
374-
elif env == 'dog_gut':
375-
model_path = os.path.join(os.path.split(__file__)[0], 'dog_gut_model.h5')
376-
return model_path
377-
elif env == 'ocean':
378-
model_path = os.path.join(os.path.split(__file__)[0], 'ocean_model.h5')
379-
return model_path
380-
elif env == 'soil':
381-
model_path = os.path.join(os.path.split(__file__)[0], 'soil_model.h5')
382-
return model_path
383-
elif env == 'cat_gut':
384-
model_path = os.path.join(os.path.split(__file__)[0], 'cat_gut_model.h5')
385-
return model_path
386-
elif env == 'human_oral':
387-
model_path = os.path.join(os.path.split(__file__)[0], 'human_oral_model.h5')
388-
return model_path
389-
elif env == 'mouse_gut':
390-
model_path = os.path.join(os.path.split(__file__)[0], 'mouse_gut_model.h5')
391-
return model_path
392-
elif env == 'pig_gut':
393-
model_path = os.path.join(os.path.split(__file__)[0], 'pig_gut_model.h5')
394-
return model_path
395-
elif env == 'built_environment':
396-
model_path = os.path.join(os.path.split(__file__)[0], 'built_environment_model.h5')
397-
return model_path
398-
elif env == 'wastewater':
399-
model_path = os.path.join(os.path.split(__file__)[0], 'wastewater_model.h5')
400-
return model_path
401-
elif env == 'whole':
402-
model_path = os.path.join(os.path.split(__file__)[0], 'whole_model.h5')
403-
return model_path
371+
if env in [
372+
'human_gut',
373+
'dog_gut',
374+
'ocean',
375+
'soil',
376+
'cat_gut',
377+
'human_oral',
378+
'mouse_gut',
379+
'pig_gut',
380+
'built_environment',
381+
'wastewater',
382+
'global',
383+
]:
384+
return os.path.join(os.path.split(__file__)[0], f'{env}_model.h5')
404385
else:
405386
sys.stderr.write(
406387
f"Error: Expected environment '{env}' does not exist\n")

docs/index.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -43,15 +43,15 @@ Reconstruct bins with single or co-assembly binning using one line command.
4343
* `-o/--output`: Output directory (will be created if non-existent).
4444
* `--cannot-name:` Name for the cannot-link file (Default: cannot).
4545
* `-r/--reference-db-data-dir`: GTDB reference directory (Default: $HOME/.cache/SemiBin/mmseqs2-GTDB). SemiBin will lazily download GTDB if it is not found there.
46-
* `-p/--processes/-t/--threads`: Number of CPUs used(0: use whole).
46+
* `-p/--processes/-t/--threads`: Number of CPUs used (0: use all).
4747
* `--minfasta-kbs`: minimum bin size in kilo-basepairs (Default: 200).
4848
* `--recluster` : [Deprecated] Does nothing (current default is to perform clustering).
4949
* `--epoches`: Number of epoches used in the training process(Default: 20).
5050
* `--batch-size`: Batch size used in the training process(Default: 2048).
5151
* `--max-node`: Percentage of contigs that considered to be binned(Default: 1).
5252
* `--max-edges`: The maximum number of edges that can be connected to one contig(Default: 200).
5353
* `--random-seed`: Random seed to reproduce results.
54-
* `--environment`: Environment for the built-in model(human_gut/dog_gut/ocean/soil/cat_gut/human_oral/mouse_gut/pig_gut/built_environment/wastewater/whole).
54+
* `--environment`: Environment for the built-in model (`human_gut`/`dog_gut`/`ocean`/`soil`/`cat_gut`/`human_oral`/`mouse_gut`/`pig_gut`/`built_environment`/`wastewater`/`global`).
5555
* `--ratio` : If the ratio of the number of base pairs of contigs between 1000-2500 bp smaller than this value, the minimal length will be set as 1000bp, otherwise2500bp. If you set -m parameter, you do not need to use this parameter. If you use SemiBin with multi steps and you use this parameter, please use this parameter consistently with all subcommands(Default: 0.05).
5656
* `-m/--min-len` : Minimal length for contigs in binning. If you use SemiBin with multi steps and you use this parameter, please use this parameter consistently with all subcommands.(Default: SemiBin chooses 1000bp or 2500bp according the ratio of the number of base pairs of contigs between 1000-2500 bp).
5757
* `--ml-threshold` : Length threshold for generating must-link constraints.(By default, the threshold is calculated from the contig, and the default minimum value is 4,000 bp)

docs/usage.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ Input: S1.fna and S1.bam
99
```bash
1010
SemiBin single_easy_bin -i S1.fna -b S1.bam -o output
1111
```
12-
Or with our built-in model(`human_gut`/`dog_gut`/`ocean`/`soil`/`cat_gut`/`human_oral`/`mouse_gut`/`pig_gut`/`built_environment`/`wastewater`/`whole`)
12+
Or with our built-in model(`human_gut`/`dog_gut`/`ocean`/`soil`/`cat_gut`/`human_oral`/`mouse_gut`/`pig_gut`/`built_environment`/`wastewater`/`global`)
1313
```bash
1414
SemiBin single_easy_bin -i S1.fna -b S1.bam -o output --environment human_gut
1515
```
@@ -31,7 +31,7 @@ SemiBin train -i S1.fna --data S1_output/train.csv --data-split S1_output/train_
3131
```bash
3232
SemiBin bin -i S1.fna --model S1_output/model.h5 --data S1_output/data.csv -o output
3333
```
34-
or with our built-in model(`human_gut`/`dog_gut`/`ocean`/`soil`/`cat_gut`/`human_oral`/`mouse_gut`/`pig_gut`/`built_environment`/`wastewater`/`whole`)
34+
or with our built-in model(`human_gut`/`dog_gut`/`ocean`/`soil`/`cat_gut`/`human_oral`/`mouse_gut`/`pig_gut`/`built_environment`/`wastewater`/`global`)
3535
```bash
3636
SemiBin bin -i S1.fna --data S1_output/data.csv -o output --environment human_gut
3737
```

docs/whatsnew.md

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -8,14 +8,13 @@
88
- The subcommand to generate cannot links is now called
99
`generate_cannot_links`. The old name (`predict_taxonomy`) is kept as a
1010
deprecated alias.
11-
11+
- Provide pretrained models from soil, cat gut, human oral,pig gut, mouse gut,
12+
built environment, wastewater and global (training from all samples).
13+
- Add `check_install` command and run `check_install` before easy command
14+
1215
### Bugfixes
1316
- Fix bug with non-standard characters in sample names (#68).
1417

15-
### Internal improvements
16-
- Add `check_install` command and run `check_install` before easy command
17-
- Provide pretrained models from soil, cat gut, human oral,pig gut, mouse gut, built environment, wastewater and whole(training from all samples).
18-
1918
## Version 0.5
2019

2120
*Released January 7 2022*

0 commit comments

Comments
 (0)