Skip to content

Commit 656fe1d

Browse files
authored
Update README.md
1 parent 47fbe62 commit 656fe1d

File tree

3 files changed

+65
-18
lines changed

3 files changed

+65
-18
lines changed

README.md

Lines changed: 53 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,17 @@ mkdir $OUTPUT
4444
# establish number of common 25-mers between single sequence and the database
4545
# (minhash filtering that retains 10% of MT159713 k-mers is done automatically prior to the comparison)
4646
./kmer-db one2all $OUTPUT/k25.db $INPUT/data/MT159713.fasta $OUTPUT/MT159713.csv
47+
48+
# build two partial databases
49+
./kmer-db build $INPUT/seqs.part1.list $OUTPUT/k18.parts1.db
50+
./kmer-db build $INPUT/seqs.part2.list $OUTPUT/k18.parts2.db
51+
52+
# establish numbers of common k-mers between all sequences in the databases,
53+
# computations are done in the sparse mode, the output matrix is also sparse
54+
echo $OUTPUT/k18.parts1.db > $OUTPUT/db.list
55+
echo $OUTPUT/k18.parts2.db >> $OUTPUT/db.list
56+
./kmer-db all2all-parts $OUTPUT/db.list $OUTPUT/k18.parts.csv
57+
4758
```
4859

4960

@@ -68,18 +79,25 @@ conda install -c bioconda kmer-db
6879
For detailed instructions how to set up Bioconda, please refer to the [Bioconda manual](https://bioconda.github.io/user/install.html#install-conda).
6980
Kmer-db can be also built from the sources distributed as:
7081

71-
* MAKE project (G++ 5.5.0 tested) for Linux and OS X,
82+
* MAKE project (G++ 11 tested) for Linux and OS X,
7283
* Visual Studio 2015 solution for Windows.
7384

7485

86+
## Vector extensions
7587

76-
## *zlib* linking
88+
Kmer-db can be built for x86-64 and ARM64 8 architectures (including Apple Mx based on ARM64 8.4 core) and takes advantage of AVX2 (x86-64) and NEON (ARM) CPU extensions. The default target platform is x86-64 with AVX2 extensions. This, however, can be changed by setting `PLATFORM` variable for `make`:
7789

78-
Kmer-db uses *zlib* for handling gzipped inputs. Under Linux, the software is by default linked against system-installed *zlib*. Due to issues with some library versions, precompiled *zlib* is also present the repository. In order to use it, one needs to modify variable INTERNAL_ZLIB at the top of the makefile. Under Windows, the repository library is always used.
79-
80-
## AVX and AVX2 support
90+
```bash
91+
make PLATFORM=none # unspecified platform, no extensions
92+
make PLATFORM=sse2 # x86-64 with SSE2
93+
make PLATFORM=avx # x86-64 with AVX
94+
make PLATFORM=avx2 # x86-64 with AVX2 (default)
95+
make PLATFORM=native # x86-64 with AVX2 and native architecture
96+
make PLATFORM=arm8 # ARM64 8 with NEON
97+
make PLATFORM=m1 # ARM64 8.4 (especially Apple M1) with NEON
98+
```
8199

82-
Kmer-db, by default, takes advantage of AVX (required) and AVX2 (optional) CPU extensions. The pre-built binary determines supported instructions at runtime, thus it is multiplatform. When compiling the sources under Linux and OS X, the support of AVX2 is also established automatically. Under Windows, the program is by default built with AVX2 instructions. To prevent this, Kmer-db must be compiled with NO_AVX2 symbolic constant defined.
100+
Note, that x86-64 binaries determine the supported extensions at runtime, which makes them backwards-compatible. For instance, the AVX executable will also work on SSE-only platform, but with limited performance.
83101

84102
# 2. Usage
85103

@@ -89,6 +107,8 @@ Kmer-db operates in one of the following modes:
89107

90108
* `build` - building a database from samples,
91109
* `all2all` - counting common k-mers - all samples in the database,
110+
* `all2all-sp` - counting common k-mers - all samples in the database (sparse computation)
111+
* `all2all-parts` - counting common k-mers - all samples in the database parts (sparse computation)
92112
* `new2all` - counting common k-mers - set of new samples versus database,
93113
* `one2all` - counting common k-mers - single sample versus database,
94114
* `distance` - calculating similarities/distances,
@@ -132,30 +152,45 @@ Parameters:
132152
## 2.2. Counting common k-mers
133153
134154
### Samples in the database against each other:
155+
156+
Dense computations - recomended when the distance matrix contains few zeros. Output can be stored in the dense or sparse form (`-sparse` switch).
157+
158+
`kmer-db all2all [-buffer <size_mb>] [-sparse] [-t <threads>] [-above <v>] [-below <v>] [-above_eq <v>] [-below_eq <v>] <database> <common_table>`
135159
136-
`kmer-db all2all [-buffer <size_mb>] [-sparse] [-t <threads>] <database> <common_table>`
160+
Sparse computations - recommended when the distance matrix contains many zeros. Output matrix is always in the sparse form:
161+
162+
`kmer-db all2all-sp [-buffer <size_mb>] [-t <threads>] [-above <v>] [-below <v>] [-above_eq <v>] [-below_eq <v>] <database> <common_table>`
163+
164+
Sparse computations, partial databases - use when the distance matrix contains many zeros and there are multiple partial databases. Output matrix is always in the sparse form:
165+
166+
`kmer-db all2all-parts [-buffer <size_mb>] [-sparse] [-t <threads>] [-above <v>] [-below <v>] [-above_eq <v>] [-below_eq <v>] <db_list> <common_table>`
137167
138168
Parameters:
139169
* `database` (input) - k-mer database file created by `build` mode,
170+
* `db_list` (input) - file containing list of databases files created by `build` mode,
140171
* `common_table` (output) - file containing table with common k-mer counts.
141172
* `-buffer <size_mb>` - size of cache buffer in megabytes; use L3 size for Intel CPUs and L2 for AMD for best performance; default: 8
142173
* `-sparse` - stores output matrix in a sparse form,
143-
* `-above <a_th>` - retains elements larger then <a_th>,
144-
* `-below <b_th>` - retains elements smaller then <b_th>.
174+
* `-above <v>` - retains elements greater then `<v>`
175+
* `-below <v>` - retains elements less then `<v>`
176+
* `-above_eq <v>` - retains elements greater or equal `<v>`
177+
* `-below_eq <v>` - retains elements less or equal `<v>`
145178
* `-t <threads>` - number of threads (default: number of available cores).
146179
147180
### New samples against the database:
148181
149-
`kmer-db new2all [-multisample-fasta | -from-kmers | -from-minhash] [-sparse] [-t <threads>] <database> <sample_list> <common_table>`
182+
`kmer-db new2all [-multisample-fasta | -from-kmers | -from-minhash] [-sparse] [-t <threads>] [-above <v>] [-below <v>] [-above_eq <v>] [-below_eq <v>] <database> <sample_list> <common_table>`
150183
151184
Parameters:
152185
* `database` (input) - k-mer database file created by `build` mode.
153186
* `sample_list` (input) - file containing list of samples in one of the supported formats (see `build` mode); if samples are given as genomes (default) or k-mers (`-from-kmers` switch), the minhashing is done automatically with the same filter as in the database.
154187
* `common_table` (output) - file containing table with common k-mer counts.
155188
* `-multisample-fasta` / `-from-kmers` / `-from-minhash` - see `build` mode for details.
156189
* `-sparse` - stores output matrix in a sparse form,
157-
* `-above <a_th>` - retains elements larger then <a_th>,
158-
* `-below <b_th>` - retains elements smaller then <b_th>,
190+
* `-above <v>` - retains elements greater then `<v>`
191+
* `-below <v>` - retains elements less then `<v>`
192+
* `-above_eq <v>` - retains elements greater or equal `<v>`
193+
* `-below_eq <v>` - retains elements less or equal `<v>`
159194
* `-t <threads>` - number of threads (default: number of available cores).
160195
161196
### Single sample against the database:
@@ -201,7 +236,7 @@ When `-sparse` switch is specified, the table is stored in a sparse form. In par
201236
202237
## 2.3. Calculating similarities or distances
203238
204-
`kmer-db distance [<measures>] [-sparse [-above <a_th>] [-below <b_th>]] <common_table>`
239+
`kmer-db distance [<measures>] [-sparse] [-above <v>] [-below <v>] [-above_eq <v>] [-below_eq <v>] <common_table>`
205240
206241
Parameters:
207242
* `common_table` (input) - file containing table with numbers of common k-mers produced by `all2all`, `new2all`, or `one2all` mode (both, dense and sparse matrices are supported).
@@ -213,9 +248,11 @@ Parameters:
213248
* `mash` (Mash distance): $\textrm{Mash}(q,s) = -\frac{1}{k}ln\frac{2 \cdot J(q,s)}{1 + J(q,s)}$
214249
* `ani` (average nucleotide identity): $\textrm{ANI}(q,s) = 1 - \textrm{Mash}(p,q)$
215250
* `-phylip-out` - store output distance matrix in a Phylip format,
216-
* `-sparse` - outputs a sparse matrix (independently of the input matrix format),
217-
* `-above <a_th>` - retains elements larger then <a_th>,
218-
* `-below <b_th>` - retains elements smaller then <b_th>.
251+
* `-sparse` - outputs a sparse matrix (only for dense input matrices - sparse inputs always produce sparse outputs),
252+
* `-above <v>` - retains elements greater then `<v>`
253+
* `-below <v>` - retains elements less then `<v>`
254+
* `-above_eq <v>` - retains elements greater or equal `<v>`
255+
* `-below_eq <v>` - retains elements less or equal `<v>`
219256
220257
This mode generates a file with similarity/distance table for each selected measure. Name of the output file is produced by adding to the input file an extension with a measure name.
221258

libs/refresh/active_thread_pool/lib/active_thread_pool.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
#include <iostream>
1616

1717
//#define REFRESH_ATP_DEBUG
18-
#define REFRESH_ATP_STATS
18+
//#define REFRESH_ATP_STATS
1919

2020
namespace refresh
2121
{

quick-start.sh

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,4 +27,14 @@ mkdir $OUTPUT
2727

2828
# establish number of common 25-mers between single sequence and the database
2929
# (minhash filtering that retains 10% of MT159713 k-mers is done prior to the comparison)
30-
./kmer-db one2all $OUTPUT/k25.db $INPUT/data/MT159713.fasta $OUTPUT/MT159713.csv
30+
./kmer-db one2all $OUTPUT/k25.db $INPUT/data/MT159713.fasta $OUTPUT/MT159713.csv
31+
32+
# build two partial databases
33+
./kmer-db build $INPUT/seqs.part1.list $OUTPUT/k18.parts1.db
34+
./kmer-db build $INPUT/seqs.part2.list $OUTPUT/k18.parts2.db
35+
36+
# establish numbers of common k-mers between all sequences in the databases,
37+
# computations are done in the sparse mode, the output matrix is also sparse
38+
echo $OUTPUT/k18.parts1.db > $OUTPUT/db.list
39+
echo $OUTPUT/k18.parts2.db >> $OUTPUT/db.list
40+
./kmer-db all2all-parts $OUTPUT/db.list $OUTPUT/k18.parts.csv

0 commit comments

Comments
 (0)