Skip to content

Commit 3c4f79a

Browse files
committed
ENH update read the docs
1 parent d2e8efb commit 3c4f79a

File tree

2 files changed

+30
-0
lines changed

2 files changed

+30
-0
lines changed

docs/subcommands.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ Reconstruct bins with single or co-assembly binning using one command.
1919
* `-i/--input-fasta` : Path to the input contig fasta file (`gzip` and `bzip2` compression are accepted).
2020
* `-b/--input-bam`: Path to the input BAM (`.bam` extension) or CRAM (`.cram`) files. You can pass multiple BAM files, one per sample.
2121
* `-o/--output`: Output directory (will be created if non-existent).
22+
* `-a/--abundance` Path to the abundance file from strobealign-aemb. This can only be used when samples used in binning above or equal 5.
2223

2324
#### Recommended arguments
2425

@@ -126,6 +127,7 @@ These are the are same as for `single_easy_bin`.
126127
* `--ml-threshold`
127128
* `--taxonomy-annotation-table`
128129
* `--tmpdir`
130+
* `-a/--abundance`
129131

130132
These are the are same as for `single_easy_bin`.
131133

@@ -138,6 +140,7 @@ The subcommand `generate_sequence_features_single` requires the contig file and
138140
* `-i/--input-fasta`
139141
* `-b/--input-bam`
140142
* `-o/--output`
143+
* `-a/--abundance`
141144

142145
These are the are same as for `single_easy_bin`.
143146

@@ -161,6 +164,7 @@ The subcommand `generate_sequence_features_multi` requires the combined contig f
161164
* `-i/--input-fasta`
162165
* `-o/--output`
163166
* `-b/--input-bam`
167+
* `-a/--abundance`
164168

165169
These are the same as for `multi_easy_bin`.
166170

docs/usage.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -419,3 +419,29 @@ SemiBin2 generate_cannot_links -i S5.fa -o S5_output
419419

420420
See the comment above about how you can bypass most of the computation if you have run `mmseqs2` to annotate your contigs against GTDB already.
421421

422+
423+
## Running SemiBin with strobealign-aemb
424+
425+
Strobealign-aemb is a fast abundance estimation method for metagenomic binning.
426+
As strobealign-aemb can not provide the mapping information for every position of the contig, so we can not run SemiBin2 with strobealign-aemb in binning modes where samples used smaller 5 and need to split the contigs to generate the must-link constratints.
427+
428+
429+
1. Split the fasta files
430+
```bash
431+
python script/generate_split.py -c contig.fa -o output
432+
```
433+
2. Map reads using [strobealign-aemb](https://github.com/ksahlin/strobealign) to generate the abundance information
434+
```bash
435+
strobealign --aemb output/split.fa read1_1.fq read1_2.fq -R 6 > sample1.txt
436+
strobealign --aemb output/split.fa read2_1.fq read2_2.fq -R 6 > sample2.txt
437+
strobealign --aemb output/split.fa read3_1.fq read3_2.fq -R 6 > sample3.txt
438+
strobealign --aemb output/split.fa read4_1.fq read4_2.fq -R 6 > sample4.txt
439+
strobealign --aemb output/split.fa read5_1.fq read5_2.fq -R 6 > sample5.txt
440+
```
441+
3. Run SemiBin2 (like running SemiBin with BAM files)
442+
```bash
443+
SemiBin2 generate_sequence_features_single -i contig.fa -a *.txt -o output
444+
SemiBin2 generate_sequence_features_multi -i contig.fa -a *.txt -s : -o output
445+
SemiBin2 single_easy_bin -i contig.fa -a *.txt -o output
446+
SemiBin2 multi_easy_bin i contig.fa -a *.txt -s : -o output
447+

0 commit comments

Comments
 (0)