Skip to content

Commit 76366f0

Browse files
committed
v0.7.0
1 parent 6ecf0da commit 76366f0

File tree

16 files changed

+147
-81
lines changed

16 files changed

+147
-81
lines changed

CHANGELOG.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
# Changelog
22

3-
### v0.7.0 - 2025-04-01
3+
### v0.7.0 - 2025-04-11
4+
5+
Please rebuild the index, as some seeds in the genome end regions were missed during computation.
46

57
- `lexicmap index`:
68
- **Fix a little bug in seed desert filling** -- forgot to fill the region (a few hundred bases) behind the last seed.

README.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -67,14 +67,14 @@ However, given the increasing rate at which genomes are sequenced, **existing to
6767
1. LexicMap enables efficient indexing and searching of both RefSeq+GenBank and the [AllTheBacteria](https://www.biorxiv.org/content/10.1101/2024.03.08.584059v1) datasets (**2.3 and 1.9 million prokaryotic assemblies** respectively).
6868
1. When searching in all **2,340,672 Genbank+Refseq prokaryotic genomes**, *Blastn is unable to run with this dataset on common servers as it requires >2000 GB RAM*. (see [performance](#performance)).
6969

70-
**With LexicMap v0.6.0** (48 CPUs),
70+
**With LexicMap v0.7.0** (48 CPUs),
7171

72-
|Query |Genome hits|Genome hits<br/>(high-similarity)|Genome hits<br/>(medium-similarity)|Genome hits<br/>(low-similarity)|Time |RAM |
73-
|:-------------------|----------:|--------------------------------:|----------------------------------:|-------------------------------:|----------:|------:|
74-
|A 1.3-kb marker gene|41,718 |11,746 |114 |29,848 |3m:09s |3.84GB |
75-
|A 1.5-kb 16S rRNA |1,955,160 |245,669 |501,177 |1,208,314 |37m:52s |10.82GB|
76-
|A 52.8-kb plasmid |561,717 |96 |15,359 |546,262 |51m:59s |13.96GB|
77-
|1003 AMR genes |30,938,862 |7,635,500 |4,855,759 |18,447,603 |23h:13m:35s|22.5GB |
72+
|Query |Genome hits|Genome hits<br/>(high-similarity)|Genome hits<br/>(medium-similarity)|Genome hits<br/>(low-similarity)|Time |RAM |
73+
|:-------------------|----------:|--------------------------------:|----------------------------------:|-------------------------------:|----------:|-------:|
74+
|A 1.3-kb marker gene|41,718 |11,746 |115 |29,857 |3m:06s |3.97 GB |
75+
|A 1.5-kb 16S rRNA |1,955,167 |245,884 |501,691 |1,207,592 |32m:59s |11.09 GB|
76+
|A 52.8-kb plasmid |560,330 |96 |15,370 |544,864 |52m:22s |14.48 GB|
77+
|1003 AMR genes |30,967,882 |7,636,386 |4,858,063 |18,473,433 |15h:52m:08s|24.86 GB|
7878

7979
Notes:
8080
1. Default paramters are used, for returning all possible matches.
@@ -104,7 +104,7 @@ Querying (see the tutorial of [searching](http://bioinf.shenwei.me/LexicMap/tuto
104104
```plain
105105
# For short queries like genes or long reads, returning top N hits.
106106
lexicmap search -d db.lmi query.fasta -o query.fasta.lexicmap.tsv \
107-
--min-qcov-per-hsp 70 --min-qcov-per-genome 70 --top-n-genomes 1000
107+
--min-qcov-per-hsp 70 --min-qcov-per-genome 70 --top-n-genomes 10000
108108
109109
# For longer queries like plasmids, returning all hits.
110110
lexicmap search -d db.lmi query.fasta -o query.fasta.lexicmap.tsv \
@@ -210,7 +210,7 @@ LexicMap is implemented in [Go](https://go.dev/) programming language,
210210
executable binary files **for most popular operating systems** are freely available
211211
in [release page](https://github.com/shenwei356/lexicmap/releases).
212212

213-
Or install with `conda`:
213+
Or install with conda or pixi:
214214

215215
conda install -c bioconda lexicmap
216216

@@ -228,9 +228,9 @@ bioRxiv. [https://doi.org/10.1101/2024.08.30.610459](https://doi.org/10.1101/202
228228

229229
## Limitations
230230

231-
- The queries need to be longer than 100 bp.
231+
- The queries need to be longer than 100 bp, though some shorter one can also be aligned.
232232
- LexicMap is slow for >1Mb queries, and the alignment might be fragmented.
233-
- LexicMap is slow for batch searching with more than hundreds of queries. While, there are [some ways to improve the search speed of lexicmap search](http://bioinf.shenwei.me/LexicMap/tutorials/search/#improving-searching-speed), such as keeping the top N genome matches via `-n/--top-n-genomes` or storing the index on solid state drives (SSDs).
233+
- LexicMap is slow for batch searching with more than hundreds of queries. However, there are [some ways to improve the search speed of lexicmap search](http://bioinf.shenwei.me/LexicMap/tutorials/search/#improving-searching-speed), such as keeping the top N genome matches via `-n/--top-n-genomes` or storing the index on solid state drives (SSDs).
234234

235235
## Terminology differences
236236

demo/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -352,6 +352,7 @@ Sbjct 460059 CAAGGTAACCGTAGGGGAACCTGCGGTTGGATCACCTCCTTA 460100
352352
NC_001895.1 33593 2 GCF_003697165.2 NZ_CP033092.2 77.588 5 6 2.438 820 84.390 1 14540 15358 1878798 1879617 + 4903501 1.29e-264 911 Escherichia coli
353353
NC_001895.1 33593 2 GCF_002949675.1 NZ_CP026774.1 0.976 1 1 0.976 331 85.801 3 13919 14246 3704319 3704649 - 4395762 6.35e-112 403 Shigella dysenteriae
354354

355+
355356
### Simulated Oxford Nanopore R10.4.1 long-reads
356357

357358
Here we use the flag `-w/--load-whole-seeds` to accelerate searching.

docs/content/_index.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -64,14 +64,14 @@ Step 2: searching
6464

6565
### Accurate and efficient alignment
6666

67-
Using LexicMap to search in the whole **2,340,672** Genbank+Refseq prokaryotic genomes with 48 CPUs.
68-
69-
|Query |Genome hits|Time |RAM |
70-
|:-------------------|----------:|----------:|------:|
71-
|A 1.3-kb marker gene|41,718 |3m:09s |3.84GB |
72-
|A 1.5-kb 16S rRNA |1,955,160 |37m:52s |10.82GB|
73-
|A 52.8-kb plasmid |561,717 |51m:59s |13.96GB|
74-
|1003 AMR genes |30,938,862 |23h:13m:35s|22.5GB |
67+
Using LexicMap to align in the whole **2,340,672** Genbank+Refseq prokaryotic genomes with 48 CPUs.
68+
69+
|Query |Genome hits|Time |RAM(GB)|
70+
|:----------------|----------:|------:|------:|
71+
|A 1.3-kb gene |41,718 |3m:06s |3.97 |
72+
|A 1.5-kb 16S rRNA|1,955,167 |32m:59s|11.09 |
73+
|A 52.8-kb plasmid|560,330 |52m:22s|14.48 |
74+
|1003 AMR genes |30,967,882 |15h:52m|24.86 |
7575

7676

7777
***Blastn** is unable to run with the same dataset on common servers as it requires >2000 GB RAM*.

docs/content/installation/_index.md

Lines changed: 12 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ or [compiling from the source](#compile-from-the-source).
88

99
Besides, it supports [shell completion](#shell-completion), which could help accelerate typing.
1010

11-
## Conda
11+
## Conda/Pixi
1212

1313
[Install conda](https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.html), then run
1414

@@ -18,6 +18,11 @@ Or use [mamba](https://mamba.readthedocs.io/en/latest/installation/mamba-install
1818

1919
conda install -c conda-forge mamba
2020
mamba install -c bioconda lexicmap
21+
22+
Or use [pixi](https://pixi.sh/), which is even faster.
23+
24+
pixi config channels add bioconda
25+
pixi add lexicmap
2126

2227
Linux and MacOS (both x86 and arm CPUs) are supported.
2328

@@ -31,8 +36,8 @@ Linux and MacOS (both x86 and arm CPUs) are supported.
3136

3237
|OS |Arch |File, 中国镜像 |
3338
|:------|:---------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
34-
|Linux |**64-bit**|[**lexicmap_linux_amd64.tar.gz**](https://github.com/shenwei356/LexicMap/releases/download/v0.6.1/lexicmap_linux_amd64.tar.gz), [中国镜像](http://app.shenwei.me/data/lexicmap/lexicmap_linux_amd64.tar.gz) |
35-
|Linux |arm64 |[**lexicmap_linux_arm64.tar.gz**](https://github.com/shenwei356/LexicMap/releases/download/v0.6.1/lexicmap_linux_arm64.tar.gz), [中国镜像](http://app.shenwei.me/data/lexicmap/lexicmap_linux_arm64.tar.gz) |
39+
|Linux |**64-bit**|[**lexicmap_linux_amd64.tar.gz**](https://github.com/shenwei356/LexicMap/releases/download/v0.7.0/lexicmap_linux_amd64.tar.gz), [中国镜像](http://app.shenwei.me/data/lexicmap/lexicmap_linux_amd64.tar.gz) |
40+
|Linux |arm64 |[**lexicmap_linux_arm64.tar.gz**](https://github.com/shenwei356/LexicMap/releases/download/v0.7.0/lexicmap_linux_arm64.tar.gz), [中国镜像](http://app.shenwei.me/data/lexicmap/lexicmap_linux_arm64.tar.gz) |
3641

3742
2. Decompress it:
3843

@@ -65,8 +70,8 @@ Linux and MacOS (both x86 and arm CPUs) are supported.
6570

6671
|OS |Arch |File, 中国镜像 |
6772
|:------|:---------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
68-
|macOS |64-bit|[**lexicmap_darwin_amd64.tar.gz**](https://github.com/shenwei356/LexicMap/releases/download/v0.6.1/lexicmap_darwin_amd64.tar.gz), [中国镜像](http://app.shenwei.me/data/lexicmap/lexicmap_darwin_amd64.tar.gz) |
69-
|macOS |**arm64** |[**lexicmap_darwin_arm64.tar.gz**](https://github.com/shenwei356/LexicMap/releases/download/v0.6.1/lexicmap_darwin_arm64.tar.gz), [中国镜像](http://app.shenwei.me/data/lexicmap/lexicmap_darwin_arm64.tar.gz) |
73+
|macOS |64-bit|[**lexicmap_darwin_amd64.tar.gz**](https://github.com/shenwei356/LexicMap/releases/download/v0.7.0/lexicmap_darwin_amd64.tar.gz), [中国镜像](http://app.shenwei.me/data/lexicmap/lexicmap_darwin_amd64.tar.gz) |
74+
|macOS |**arm64** |[**lexicmap_darwin_arm64.tar.gz**](https://github.com/shenwei356/LexicMap/releases/download/v0.7.0/lexicmap_darwin_arm64.tar.gz), [中国镜像](http://app.shenwei.me/data/lexicmap/lexicmap_darwin_arm64.tar.gz) |
7075

7176
2. Copy it to any directory in the environment variable `PATH`:
7277

@@ -91,7 +96,7 @@ Linux and MacOS (both x86 and arm CPUs) are supported.
9196

9297
|OS |Arch |File, 中国镜像 |
9398
|:------|:---------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
94-
|FreeBSD|**64-bit**|[**lexicmap_freebsd_amd64.tar.gz**](https://github.com/shenwei356/LexicMap/releases/download/v0.6.1/lexicmap_freebsd_amd64.tar.gz), [中国镜像](http://app.shenwei.me/data/lexicmap/lexicmap_freebsd_amd64.tar.gz) |
99+
|FreeBSD|**64-bit**|[**lexicmap_freebsd_amd64.tar.gz**](https://github.com/shenwei356/LexicMap/releases/download/v0.7.0/lexicmap_freebsd_amd64.tar.gz), [中国镜像](http://app.shenwei.me/data/lexicmap/lexicmap_freebsd_amd64.tar.gz) |
95100

96101
{{< /tab >}}
97102

@@ -103,7 +108,7 @@ Linux and MacOS (both x86 and arm CPUs) are supported.
103108

104109
|OS |Arch |File, 中国镜像 |
105110
|:------|:---------|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
106-
|Windows|**64-bit**|[**lexicmap_windows_amd64.exe.tar.gz**](https://github.com/shenwei356/LexicMap/releases/download/v0.6.1/lexicmap_windows_amd64.exe.tar.gz), [中国镜像](http://app.shenwei.me/data/lexicmap/lexicmap_windows_amd64.exe.tar.gz)|
111+
|Windows|**64-bit**|[**lexicmap_windows_amd64.exe.tar.gz**](https://github.com/shenwei356/LexicMap/releases/download/v0.7.0/lexicmap_windows_amd64.exe.tar.gz), [中国镜像](http://app.shenwei.me/data/lexicmap/lexicmap_windows_amd64.exe.tar.gz)|
107112

108113

109114
2. Decompress it.

docs/content/introduction/_index.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -71,14 +71,14 @@ However, given the increasing rate at which genomes are sequenced, **existing to
7171
1. LexicMap enables efficient indexing and searching of both RefSeq+GenBank and the [AllTheBacteria](https://www.biorxiv.org/content/10.1101/2024.03.08.584059v1) datasets (**2.3 and 1.9 million prokaryotic assemblies** respectively).
7272
1. When searching in all **2,340,672 Genbank+Refseq prokaryotic genomes**, *Blastn is unable to run with this dataset on common servers as it requires >2000 GB RAM*. (see [performance](#performance)).
7373

74-
**With LexicMap v0.6.0** (48 CPUs),
75-
76-
|Query |Genome hits|Genome hits<br/>(high-similarity)|Genome hits<br/>(medium-similarity)|Genome hits<br/>(low-similarity)|Time |RAM |
77-
|:-------------------|----------:|--------------------------------:|----------------------------------:|-------------------------------:|----------:|------:|
78-
|A 1.3-kb marker gene|41,718 |11,746 |114 |29,848 |3m:09s |3.84GB |
79-
|A 1.5-kb 16S rRNA |1,955,160 |245,669 |501,177 |1,208,314 |37m:52s |10.82GB|
80-
|A 52.8-kb plasmid |561,717 |96 |15,359 |546,262 |51m:59s |13.96GB|
81-
|1003 AMR genes |30,938,862 |7,635,500 |4,855,759 |18,447,603 |23h:13m:35s|22.5GB |
74+
**With LexicMap v0.7.0** (48 CPUs),
75+
76+
|Query |Genome hits|Genome hits<br/>(high-similarity)|Genome hits<br/>(medium-similarity)|Genome hits<br/>(low-similarity)|Time |RAM |
77+
|:-------------------|----------:|--------------------------------:|----------------------------------:|-------------------------------:|----------:|-------:|
78+
|A 1.3-kb marker gene|41,718 |11,746 |115 |29,857 |3m:06s |3.97 GB |
79+
|A 1.5-kb 16S rRNA |1,955,167 |245,884 |501,691 |1,207,592 |32m:59s |11.09 GB|
80+
|A 52.8-kb plasmid |560,330 |96 |15,370 |544,864 |52m:22s |14.48 GB|
81+
|1003 AMR genes |30,967,882 |7,636,386 |4,858,063 |18,473,433 |15h:52m:08s|24.86 GB|
8282

8383
Notes:
8484
1. Default paramters are used, for returning all possible matches.
@@ -105,7 +105,7 @@ Querying (see the tutorial of [searching](http://bioinf.shenwei.me/LexicMap/tuto
105105
```plain
106106
# For short queries like genes or long reads, returning top N hits.
107107
lexicmap search -d db.lmi query.fasta -o query.fasta.lexicmap.tsv \
108-
--min-qcov-per-hsp 70 --min-qcov-per-genome 70 --top-n-genomes 1000
108+
--min-qcov-per-hsp 70 --min-qcov-per-genome 70 --top-n-genomes 10000
109109
110110
# For longer queries like plasmids, returning all hits.
111111
lexicmap search -d db.lmi query.fasta -o query.fasta.lexicmap.tsv \
@@ -211,7 +211,7 @@ LexicMap is implemented in [Go](https://go.dev/) programming language,
211211
executable binary files **for most popular operating systems** are freely available
212212
in [release page](https://github.com/shenwei356/lexicmap/releases).
213213

214-
Or install with `conda`:
214+
Or install with conda or pixi:
215215

216216
conda install -c bioconda lexicmap
217217

@@ -229,9 +229,9 @@ bioRxiv. [https://doi.org/10.1101/2024.08.30.610459](https://doi.org/10.1101/202
229229

230230
## Limitations
231231

232-
- The queries need to be longer than 100 bp.
232+
- The queries need to be longer than 100 bp, though some shorter one can also be aligned.
233233
- LexicMap is slow for >1Mb queries, and the alignment might be fragmented.
234-
- LexicMap is slow for batch searching with more than hundreds of queries. While, there are [some ways to improve the search speed of lexicmap search](http://bioinf.shenwei.me/LexicMap/tutorials/search/#improving-searching-speed), such as keeping the top N genome matches via `-n/--top-n-genomes` or storing the index on solid state drives (SSDs).
234+
- LexicMap is slow for batch searching with more than hundreds of queries. However, there are [some ways to improve the search speed of lexicmap search](http://bioinf.shenwei.me/LexicMap/tutorials/search/#improving-searching-speed), such as keeping the top N genome matches via `-n/--top-n-genomes` or storing the index on solid state drives (SSDs).
235235

236236
## Terminology differences
237237

docs/content/performance@genbank.tsv

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
Query Genome hits Genome hits<br/>(high-similarity) Genome hits<br/>(medium-similarity) Genome hits<br/>(low-similarity) Time RAM
2-
A 1.3-kb marker gene 41,718 11,746 114 29,848 3m:09s 3.84GB
3-
A 1.5-kb 16S rRNA 1,955,160 245,669 501,177 1,208,314 37m:52s 10.82GB
4-
A 52.8-kb plasmid 561,717 96 15,359 546,262 51m:59s 13.96GB
5-
1003 AMR genes 30,938,862 7,635,500 4,855,759 18,447,603 23h:13m:35s 22.5GB
2+
A 1.3-kb marker gene 41718 11746 115 29857 3m:06s 3.97 GB
3+
A 1.5-kb 16S rRNA 1955167 245884 501691 1207592 32m:59s 11.09 GB
4+
A 52.8-kb plasmid 560330 96 15370 544864 52m:22s 14.48 GB
5+
1003 AMR genes 30967882 7636386 4858063 18473433 15h:52m:08s 24.86 GB
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
#!/bin/sh
2+
3+
cat performance@genbank.tsv \
4+
| csvtk replace -t -f RAM -p ' .+' \
5+
| csvtk rename -t -f RAM -n 'RAM(GB)' \
6+
| csvtk replace -t -f Query -p 'marker ' \
7+
| csvtk replace -t -f Time -p '(\d+h:\d+m):\d+s' -r '$1' \
8+
| csvtk cut -t -f 1,2,6,7 \
9+
| csvtk comma -t -f 2 \
10+
| csvtk csv2md -t -a l,r,r,r
11+
12+
echo
13+
14+
cat performance@genbank.tsv \
15+
| csvtk comma -t -f 2-5 \
16+
| csvtk csv2md -t -a l,r,r,r,r,r,r

docs/content/releases/_index.md

Lines changed: 15 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -13,19 +13,30 @@ weight: 30
1313
- Please run `lexicmap autocompletion` to update shell autocompletion script !!!
1414
{{< /hint >}}
1515

16+
### v0.7.0 - 2025-04-11
17+
18+
[v0.7.0](https://github.com/shenwei356/LexicMap/releases/tag/v0.7.0) - 2025-04-10 [![Github Releases (by Release)](https://img.shields.io/github/downloads/shenwei356/LexicMap/v0.7.0/total.svg)](https://github.com/shenwei356/LexicMap/releases/tag/v0.7.0)
19+
20+
Please rebuild the index, as some seeds in the genome end regions were missed during computation.
21+
22+
- `lexicmap index`:
23+
- **Fix a little bug in seed desert filling** -- forgot to fill the region (a few hundred bases) behind the last seed.
24+
- `lexicmap search`:
25+
- **Improve seed chaining** -- more accurate for complex anchors.
26+
- **Improve pseudoalignment in repetitive regions**.
27+
- Change the default value of `--seed-max-gap` from 200 to 50.
28+
29+
## Previous versions
1630

1731
### v0.6.1 - 2025-03-31
1832

19-
[v0.6.1](https://github.com/shenwei356/LexicMap/releases/tag/v0.6.1) - 2025-03-25 [![Github Releases (by Release)](https://img.shields.io/github/downloads/shenwei356/LexicMap/v0.6.1/total.svg)](https://github.com/shenwei356/LexicMap/releases/tag/v0.6.1)
33+
[v0.6.1](https://github.com/shenwei356/LexicMap/releases/tag/v0.6.1) - 2025-03-31 [![Github Releases (by Release)](https://img.shields.io/github/downloads/shenwei356/LexicMap/v0.6.1/total.svg)](https://github.com/shenwei356/LexicMap/releases/tag/v0.6.1)
2034

2135
- `lexicmap search`:
2236
- Fix the program hang in the debug mode when no chaining result is returned.
2337
- `lexicmap version`:
2438
- Do not show commit hash by default.
2539

26-
27-
## Previous versions
28-
2940
### v0.6.0 - 2025-03-25
3041

3142
[v0.6.0](https://github.com/shenwei356/LexicMap/releases/tag/v0.6.0) - 2025-03-25 [![Github Releases (by Release)](https://img.shields.io/github/downloads/shenwei356/LexicMap/v0.6.0/total.svg)](https://github.com/shenwei356/LexicMap/releases/tag/v0.6.0)

0 commit comments

Comments
 (0)