You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For detailed instructions how to set up Bioconda, please refer to the [Bioconda manual](https://bioconda.github.io/user/install.html#install-conda).
69
80
Kmer-db can be also built from the sources distributed as:
70
81
71
-
* MAKE project (G++ 5.5.0 tested) for Linux and OS X,
82
+
* MAKE project (G++ 11 tested) for Linux and OS X,
72
83
* Visual Studio 2015 solution for Windows.
73
84
74
85
86
+
## Vector extensions
75
87
76
-
## *zlib* linking
88
+
Kmer-db can be built for x86-64 and ARM64 8 architectures (including Apple Mx based on ARM64 8.4 core) and takes advantage of AVX2 (x86-64) and NEON (ARM) CPU extensions. The default target platform is x86-64 with AVX2 extensions. This, however, can be changed by setting `PLATFORM` variable for `make`:
77
89
78
-
Kmer-db uses *zlib* for handling gzipped inputs. Under Linux, the software is by default linked against system-installed *zlib*. Due to issues with some library versions, precompiled *zlib* is also present the repository. In order to use it, one needs to modify variable INTERNAL_ZLIB at the top of the makefile. Under Windows, the repository library is always used.
79
-
80
-
## AVX and AVX2 support
90
+
```bash
91
+
make PLATFORM=none # unspecified platform, no extensions
92
+
make PLATFORM=sse2 # x86-64 with SSE2
93
+
make PLATFORM=avx # x86-64 with AVX
94
+
make PLATFORM=avx2 # x86-64 with AVX2 (default)
95
+
make PLATFORM=native # x86-64 with AVX2 and native architecture
96
+
make PLATFORM=arm8 # ARM64 8 with NEON
97
+
make PLATFORM=m1 # ARM64 8.4 (especially Apple M1) with NEON
98
+
```
81
99
82
-
Kmer-db, by default, takes advantage of AVX (required) and AVX2 (optional) CPU extensions. The pre-built binary determines supported instructions at runtime, thus it is multiplatform. When compiling the sources under Linux and OS X, the support of AVX2 is also established automatically. Under Windows, the program is by default built with AVX2 instructions. To prevent this, Kmer-db must be compiled with NO_AVX2 symbolic constant defined.
100
+
Note, that x86-64 binaries determine the supported extensionsat runtime, which makes them backwards-compatible. For instance, the AVX executable will also work on SSE-only platform, but with limited performance.
83
101
84
102
# 2. Usage
85
103
@@ -89,6 +107,8 @@ Kmer-db operates in one of the following modes:
89
107
90
108
*`build` - building a database from samples,
91
109
*`all2all` - counting common k-mers - all samples in the database,
110
+
*`all2all-sp` - counting common k-mers - all samples in the database (sparse computation)
111
+
*`all2all-parts` - counting common k-mers - all samples in the database parts (sparse computation)
92
112
*`new2all` - counting common k-mers - set of new samples versus database,
93
113
*`one2all` - counting common k-mers - single sample versus database,
94
114
*`distance` - calculating similarities/distances,
@@ -132,30 +152,45 @@ Parameters:
132
152
## 2.2. Counting common k-mers
133
153
134
154
### Samples in the database against each other:
155
+
156
+
Dense computations - recomended when the distance matrix contains few zeros. Output can be stored in the dense or sparse form (`-sparse` switch).
Sparse computations, partial databases - use when the distance matrix contains many zeros and there are multiple partial databases. Output matrix is always in the sparse form:
* `database` (input) - k-mer database file created by `build` mode.
153
186
* `sample_list` (input) - file containing list of samples in one of the supported formats (see `build` mode); if samples are given as genomes (default) or k-mers (`-from-kmers` switch), the minhashing is done automatically with the same filter as in the database.
154
187
* `common_table` (output) - file containing table with common k-mer counts.
155
188
* `-multisample-fasta` / `-from-kmers` / `-from-minhash` - see `build` mode for details.
156
189
* `-sparse` - stores output matrix in a sparse form,
157
-
* `-above <a_th>` - retains elements larger then <a_th>,
158
-
* `-below <b_th>` - retains elements smaller then <b_th>,
190
+
* `-above <v>` - retains elements greater then `<v>`
191
+
* `-below <v>` - retains elements less then `<v>`
192
+
* `-above_eq <v>` - retains elements greater or equal `<v>`
193
+
* `-below_eq <v>` - retains elements less or equal `<v>`
159
194
* `-t <threads>` - number of threads (default: number of available cores).
160
195
161
196
### Single sample against the database:
@@ -201,7 +236,7 @@ When `-sparse` switch is specified, the table is stored in a sparse form. In par
* `common_table` (input) - file containing table with numbers of common k-mers produced by `all2all`, `new2all`, or `one2all` mode (both, dense and sparse matrices are supported).
* `-phylip-out` - store output distance matrix in a Phylip format,
216
-
* `-sparse` - outputs a sparse matrix (independently of the input matrix format),
217
-
* `-above <a_th>` - retains elements larger then <a_th>,
218
-
* `-below <b_th>` - retains elements smaller then <b_th>.
251
+
* `-sparse` - outputs a sparse matrix (only for dense input matrices - sparse inputs always produce sparse outputs),
252
+
* `-above <v>` - retains elements greater then `<v>`
253
+
* `-below <v>` - retains elements less then `<v>`
254
+
* `-above_eq <v>` - retains elements greater or equal `<v>`
255
+
* `-below_eq <v>` - retains elements less or equal `<v>`
219
256
220
257
This mode generates a file with similarity/distance table for each selected measure. Name of the output file is produced by adding to the input file an extension with a measure name.
0 commit comments