Releases: huggingface/text-embeddings-inference
Releases · huggingface/text-embeddings-inference
v1.7.0
Notable changes
- Upgrade dependencies heavily (candle 0.5 -> 0.8 and related)
- Added ModernBert support by @kozistr !
What's Changed
- Moving cublaslt into TEI extension for easier upgrade of candle globally by @Narsil in #542
- Upgrade candle2 by @Narsil in #543
- Upgrade candle3 by @Narsil in #545
- Fixing the static-linking. by @Narsil in #547
- Fix linking bis by @Narsil in #549
- Make
sliding_window
forQwen2
optional by @alvarobartt in #546 - Optimize the performance of FlashBert on HPU by using fast mode softmax by @kaixuanliu in #555
- Fixing cudarc to the latest unified bindings. by @Narsil in #558
- Fix typos / formatting in CLI args in Markdown files by @alvarobartt in #552
- Use custom
serde
deserializer for JinaBERT models by @alvarobartt in #559 - Implement the
ModernBert
model by @kozistr in #459 - Fixing FlashAttention ModernBert. by @Narsil in #560
- Enable ModernBert on metal by @ivarflakstad in #562
- Fix
{Bert,DistilBert}SpladeHead
when loading from Safetensors by @alvarobartt in #564 - add related docs for intel cpu/xpu/hpu container by @kaixuanliu in #550
- Update the doc for submodule. by @Narsil in #567
- Update
docs/source/en/custom_container.md
by @alvarobartt in #568 - Preparing for release 1.7.0 (candle update + modernbert). by @Narsil in #570
New Contributors
- @ivarflakstad made their first contribution in #562
Full Changelog: v1.6.1...v1.7.0
v1.6.1
What's Changed
- Enable intel devices CPU/XPU/HPU for python backend by @yuanwu2017 in #245
- add reranker model support for python backend by @kaixuanliu in #386
- (FIX): CI Security Fix - branchname injection by @glegendre01 in #479
- Upgrade TEI. by @Narsil in #501
- Pin
cargo-chef
installation to 0.1.62 by @alvarobartt in #469 - add
TRUST_REMOTE_CODE
param to python backend. by @kaixuanliu in #485 - Enable splade embeddings for Python backend by @pi314ever in #493
- Hpu bucketing by @kaixuanliu in #489
- Optimize flash bert path for hpu device by @kaixuanliu in #509
- upgrade ipex to 2.6 version for cpu/xpu by @kaixuanliu in #510
- fix bug for
MaskedLanguageModel
class` by @kaixuanliu in #513 - Fix double incrementing
te_request_count
metric by @kozistr in #486 - Add intel based images to the CI by @baptistecolle in #518
- Fix typo on intel docker image by @baptistecolle in #529
- chore: Upgrade to tokenizers 0.21.0 by @lightsofapollo in #512
- feat: add support for "model_type": "gte" by @anton-pt in #519
- Update
README.md
to include ONNX by @alvarobartt in #507 - Fusing both Gte Configs. by @Narsil in #530
- Add
HF_HUB_USER_AGENT_ORIGIN
by @alvarobartt in #534 - Use
--hf-token
instead of--hf-api-token
by @alvarobartt in #535 - Fixing the tests. by @Narsil in #531
- Support classification head for DistilBERT by @kozistr in #487
- add CLI flag
disable-spans
to toggle span trace logging by @obloomfield in #481 - feat: support HF_ENDPOINT environment when downloading model by @StrayDragon in #505
- Small fixup. by @Narsil in #537
- Fix
VarBuilder
handling in GTE e.g.gte-multilingual-reranker-base
by @Narsil in #538 - make a WA in case Bert model do not have
safetensor
file by @kaixuanliu in #515 - Add missing
match
ononnx/model.onnx
download by @alvarobartt in #472 - Fixing the impure flake devShell to be able to run python code. by @Narsil in #539
- Prepare for release. by @Narsil in #540
New Contributors
- @yuanwu2017 made their first contribution in #245
- @kaixuanliu made their first contribution in #386
- @Narsil made their first contribution in #501
- @pi314ever made their first contribution in #493
- @baptistecolle made their first contribution in #518
- @lightsofapollo made their first contribution in #512
- @anton-pt made their first contribution in #519
- @obloomfield made their first contribution in #481
- @StrayDragon made their first contribution in #505
Full Changelog: v1.6.0...v1.6.1
v1.6.0
What's Changed
- feat: support multiple backends at the same time by @OlivierDehaene in #440
- feat: GTE classification head by @kozistr in #441
- feat: Implement GTE model to support the non-flash-attn version by @kozistr in #446
- feat: Implement MPNet model (#363) by @kozistr in #447
Full Changelog: v1.5.1...v1.6.0
v1.5.1
What's Changed
- Download
model.onnx_data
by @kozistr in #343 - Rename 'Sentence Transformers' to 'sentence-transformers' in docstrings by @Wauplin in #342
- fix: add serde default for truncation direction by @drbh in #399
- fix: metrics unbounded memory by @OlivierDehaene in #409
- Fix to allow health check w/o auth by @kozistr in #360
- Update
ort
crate version to2.0.0-rc.4
to support onnx IR version 10 by @kozistr in #361 - adds curl to fix healthcheck by @WissamAntoun in #376
- fix: use num_cpus::get to check as get_physical does not check cgroups by @OlivierDehaene in #410
- fix: use status code 400 when batch is empty by @OlivierDehaene in #413
- fix: add cls pooling as default for BERT variants by @OlivierDehaene in #426
- feat: auto limit string if truncate is set by @OlivierDehaene in #428
New Contributors
- @Wauplin made their first contribution in #342
- @XciD made their first contribution in #345
- @WissamAntoun made their first contribution in #376
Full Changelog: v1.5.0...v1.5.1
v1.5.0
Notable Changes
- ONNX runtime for CPU deployments: greatly improve CPU deployment throughput
- Add
/similarity
route
What's Changed
- tokenizer max limit on input size by @ErikKaum in #324
- docs: air-gapped deployments by @OlivierDehaene in #326
- feat(onnx): add onnx runtime for better CPU perf by @OlivierDehaene in #328
- feat: add
/similarity
route by @OlivierDehaene in #331 - fix(ort): fix mean pooling by @OlivierDehaene in #332
- chore(candle): update flash attn by @OlivierDehaene in #335
- v1.5.0 by @OlivierDehaene in #336
New Contributors
Full Changelog: v1.4.0...v1.5.0
v1.4.0
Notable Changes
- Cuda support for the Qwen2 model architecture
What's Changed
- feat(candle): support Qwen2 on Cuda by @OlivierDehaene in #316
- fix(candle): fix last token pooling
Full Changelog: v1.3.0...v1.4.0
v1.3.0
Notable changes
- New truncation direction parameter
- Cuda support for JinaCode model architecture
- Cuda support for Mistral model architecture
- Cuda support for Alibaba GTE model architecture
- New prompt name parameter: you can now add a prompt name to the body of your request to add a pre-prompt to your input, based on the Sentence Transformers configuration. You can also set a default prompt / prompt name to always add a pre-prompt to your requests.
What's Changed
- Ci migration to K8s by @glegendre01 in #269
- chore: map compute_cap from GPU name by @haixiw in #276
- chore: cover Nvidia T4/L4 GPU by @haixiw in #284
- feat(ci): add trufflehog secrets detection by @McPatate in #286
- Community contribution code of conduct by @LysandreJik in #291
- Update README.md by @michaelfeil in #277
- Upgrade tokenizers to 0.19.1 to deal with breaking change in tokenizers by @scriptator in #266
- Add env for OTLP service name by @kozistr in #285
- Fix CI build timeout by @fxmarty in #296
- fix(router): payload limit was not correctly applied by @OlivierDehaene in #298
- feat(candle): better cuda error by @OlivierDehaene in #300
- feat(router): add truncation direction parameter by @OlivierDehaene in #299
- Support for Jina Code model by @patricebechard in #292
- feat(router): add base64 encoding_format for OpenAI API by @OlivierDehaene in #301
- fix(candle): fix FlashJinaCodeModel by @OlivierDehaene in #302
- fix: use malloc_trim to cleanup pages by @OlivierDehaene in #307
- feat(candle): add FlashMistral by @OlivierDehaene in #308
- feat(candle): add flash gte by @OlivierDehaene in #310
- feat: add default prompts by @OlivierDehaene in #312
- Add optional CORS allow any option value in http server cli by @kir-gadjello in #260
- Update
HUGGING_FACE_HUB_TOKEN
toHF_API_TOKEN
in README by @kevinhu in #263 - v1.3.0 by @OlivierDehaene in #313
New Contributors
- @haixiw made their first contribution in #276
- @McPatate made their first contribution in #286
- @LysandreJik made their first contribution in #291
- @michaelfeil made their first contribution in #277
- @scriptator made their first contribution in #266
- @fxmarty made their first contribution in #296
- @patricebechard made their first contribution in #292
- @kir-gadjello made their first contribution in #260
- @kevinhu made their first contribution in #263
Full Changelog: v1.2.3...v1.3.0
v1.2.3
What's Changed
- fix limit peak memory to build cuda-all docker image by @OlivierDehaene in #246
Full Changelog: v1.2.2...v1.2.3
v1.2.2
What's Changed
- fix(gke): accept null values for vertex env vars by @OlivierDehaene in #243
- fix: fix cpu image to not default on the sagemaker entrypoint
Full Changelog: v1.2.1...v1.2.2
v1.2.1
TEI is now Apache 2.0!
What's Changed
- Document how to send batched inputs by @osanseviero in #222
- feat: add auto-truncate arg by @OlivierDehaene in #224
- feat: add PredictPair to proto by @OlivierDehaene in #225
- fix: fix auto_truncate for openai by @OlivierDehaene in #228
- Change license to Apache 2.0 by @OlivierDehaene in #231
- feat: Amazon SageMaker compatible images by @JGalego in #103
- fix(CI): fix build all by @OlivierDehaene in #236
- fix: fix cuda-all image by @OlivierDehaene in #239
- Add SageMaker CPU images and validate by @philschmid in #240
New Contributors
- @osanseviero made their first contribution in #222
- @JGalego made their first contribution in #103
- @philschmid made their first contribution in #240
Full Changelog: v1.2.0...v1.2.1