Skip to content

Commit bd34ca5

Browse files
v0.2.0
1 parent 5188718 commit bd34ca5

File tree

4 files changed

+57
-44
lines changed

4 files changed

+57
-44
lines changed

Cargo.lock

Lines changed: 43 additions & 34 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ members = [
1111
resolver = "2"
1212

1313
[workspace.package]
14-
version = "0.1.0"
14+
version = "0.2.0"
1515
edition = "2021"
1616
authors = ["Olivier Dehaene"]
1717
homepage = "https://github.com/huggingface/text-embeddings-inference"

README.md

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111

1212
A blazing fast inference solution for text embeddings models.
1313

14-
Benchmark for [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) on a Nvidia A10 with a sequence length of 512 tokens:
14+
Benchmark for [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1.5) on an Nvidia A10 with a sequence length of 512 tokens:
1515

1616
<p>
1717
<img src="assets/bs1-lat.png" width="400" />
@@ -36,14 +36,18 @@ Benchmark for [BAAI/bge-base-en-v1.5](https://huggingface.co/BAAI/bge-base-en-v1
3636
- [Local Install](#local-install)
3737
- [Docker Build](#docker-build)
3838

39-
- No compilation step
40-
- Dynamic shapes
41-
- Small docker images and fast boot times. Get ready for true serverless!
42-
- Token based dynamic batching
43-
- Optimized transformers code for inference using [Flash Attention](https://github.com/HazyResearch/flash-attention),
39+
Text Embeddings Inference (TEI) is a toolkit for deploying and serving open source text embeddings models. TEI enables
40+
high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE and E5. TEI implements many features
41+
such as:
42+
43+
* No model graph compilation step
44+
* Small docker images and fast boot times. Get ready for true serverless!
45+
* Token based dynamic batching
46+
* Optimized transformers code for inference using [Flash Attention](https://github.com/HazyResearch/flash-attention),
4447
[Candle](https://github.com/huggingface/candle) and [cuBLASLt](https://docs.nvidia.com/cuda/cublas/#using-the-cublaslt-api)
45-
- [Safetensors](https://github.com/huggingface/safetensors) weight loading
46-
- Production ready (distributed tracing with Open Telemetry, Prometheus metrics)
48+
* [Safetensors](https://github.com/huggingface/safetensors) weight loading
49+
* Production ready (distributed tracing with Open Telemetry, Prometheus metrics)
50+
4751

4852
## Get Started
4953

docs/openapi.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
"license": {
1010
"name": "HFOIL"
1111
},
12-
"version": "0.1.0"
12+
"version": "0.2.0"
1313
},
1414
"paths": {
1515
"/embed": {

0 commit comments

Comments
 (0)