Skip to content

Commit 6dceca1

Browse files
v0.5 updates
Former-commit-id: 51b6851 Former-commit-id: 457762f8eb6327f9a8ba4c038a698285bdd8cf94
1 parent 384cde3 commit 6dceca1

File tree

2 files changed

+14
-11
lines changed

2 files changed

+14
-11
lines changed

README.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121
<div align="center">
2222

2323
<p align="center">
24-
<b> Inference, ingestion, and indexing – supercharged by Rust 🦀</b>
24+
<b> Inference, Ingestion, and Indexing – supercharged by Rust 🦀</b>
2525
<br />
2626
<a href="https://starlightsearch.github.io/EmbedAnything/references/"><strong>Python docs »</strong></a>
2727
<br />
@@ -73,9 +73,11 @@ EmbedAnything is a minimalist, highly performant, lightning-fast, lightweight, m
7373

7474
- **Local Embedding** : Works with local embedding models like BERT and JINA
7575
- **ONNX Models**: Works with ONNX models for BERT and ColPali
76-
- **ColPali** : Support for ColPali in GPU version
76+
- **ColPali** : Support for ColPali in GPU version both on ONNX and Candle
7777
- **Splade** : Support for sparse embeddings for hybrid
7878
- **ReRankers** : Support for ReRanking Models for better RAG.
79+
- **ColBERT** : Support for ColBert on ONNX
80+
- **ModernBERT**: Increase your token length to 8K
7981
- **Cloud Embedding Models:**: Supports OpenAI and Cohere.
8082
- **MultiModality** : Works with text sources like PDFs, txt, md, Images JPG and Audio, .WAV
8183
- **Rust** : All the file processing is done in rust for speed and efficiency
@@ -121,7 +123,7 @@ data = embed_anything.embed_file("file_address", embedder=model, config=config)
121123
| Bert | All Bert based models |
122124
| CLIP | openai/clip-* |
123125
| Whisper| [OpenAI Whisper models](https://huggingface.co/collections/openai/whisper-release-6501bba2cf999715fd953013)|
124-
| ColPali | vidore/colpali-v1.2-merged |
126+
| ColPali | starlight-ai/colpali-v1.2-merged-onnx|
125127
| Colbert | answerdotai/answerai-colbert-small-v1, jinaai/jina-colbert-v2 and more |
126128
| Splade | [Splade Models](https://huggingface.co/collections/naver/splade-667eb6df02c2f3b0c39bd248) and other Splade like models |
127129
| Reranker | [Jina Reranker Models](https://huggingface.co/jinaai/jina-reranker-v2-base-multilingual), Xenova/bge-reranker |

docs/blog/posts/v0.5.md

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
draft: false
3-
date: 2025-1-31
3+
date: 2025-1-10
44
authors:
55
- sonam
66
- akshay
@@ -12,16 +12,17 @@ We are thrilled to share that EmbedAnything version 0.5 is out now and comprise
1212

1313
The best of all have been support for late-interaction model, both ColPali and ColBERT on onnx.
1414

15-
1. ModernBert Support: Well it made quite a splash, and we were obliged to add it, in the fastest inference engine, embedanything. In addition to being faster and more accurate, ModernBERT also increases context length to 8k tokens (compared to just 512 for most encoders), and is the first encoder-only model that includes a large amount of code in its training data.
16-
2. ColPali- Onnx :  Running the ColPali model directly on a local machine might not always be feasible. To address this, we developed a **quantized version of ColPali**. Find it on our hugging face, link [here](https://huggingface.co/starlight-ai/colpali-v1.2-merged-onnx). You could also run it both on Candle and on ONNX.
17-
3. ColBERT: ColBERT is a *fast* and *accurate* retrieval model, enabling scalable BERT-based search over large text collections in tens of milliseconds.
18-
4. ReRankers: EmbedAnything recently contributed for the support of reranking models to Candle so as to add it in our own library. It can support any kind of reranking models. Precision meets performance! Use reranking models to refine your retrieval results for even greater accuracy.
19-
5. Jina V3: Also contributed to V3 models, for Jina can seamlessly integrate any V3 model.
20-
6. 𝗗𝗢𝗖𝗫 𝗣𝗿𝗼𝗰𝗲𝘀𝘀𝗶𝗻𝗴
15+
1. **ModernBert** Support: Well it made quite a splash, and we were obliged to add it, in the fastest inference engine, embedanything. In addition to being faster and more accurate, ModernBERT also increases context length to 8k tokens (compared to just 512 for most encoders), and is the first encoder-only model that includes a large amount of code in its training data.
16+
2. **ColPali- Onnx** :  Running the ColPali model directly on a local machine might not always be feasible. To address this, we developed a **quantized version of ColPali**. Find it on our hugging face, link [here](https://huggingface.co/starlight-ai/colpali-v1.2-merged-onnx). You could also run it both on Candle and on ONNX.
17+
3. **ColBERT**: ColBERT is a *fast* and *accurate* retrieval model, enabling scalable BERT-based search over large text collections in tens of milliseconds.
18+
4. **ReRankers:** EmbedAnything recently contributed for the support of reranking models to Candle so as to add it in our own library. It can support any kind of reranking models. Precision meets performance! Use reranking models to refine your retrieval results for even greater accuracy.
19+
5. **Jina V3:** Also contributed to V3 models, for Jina can seamlessly integrate any V3 model.
20+
6. **𝗗𝗢𝗖𝗫 𝗣𝗿𝗼𝗰𝗲𝘀𝘀𝗶𝗻𝗴**
2121

2222
Effortlessly extract text from .docx files and convert it into embeddings. Simplify your document workflows like never before!
2323

24-
7. 𝗛𝗧𝗠𝗟 𝗣𝗿𝗼𝗰𝗲𝘀𝘀𝗶𝗻𝗴:
24+
7. **𝗛𝗧𝗠𝗟 𝗣𝗿𝗼𝗰𝗲𝘀𝘀𝗶𝗻𝗴:**
25+
2526
Parsing and embedding HTML documents just got easier!
2627

2728
✅ Extract rich metadata with embeddings

0 commit comments

Comments
 (0)