Skip to content

Error during the Launch Text Embeddings Inference #680

@henrique-1

Description

@henrique-1

System Info

Cargo version: cargo 1.88.0 (873a06493 2025-05-10)
OS Version: Fedora 42
CPU: AMD Ryzen 7 5700U
GPU: Integrated Graphics

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

Step to reproduce the error:

  1. text-embeddings-router --model-id BAAI/bge-reranker-v2-m3 --port 8085

The behavior:

2025-07-10T06:06:30.967571Z  INFO text_embeddings_router: router/src/main.rs:189: Args { model_id: "BAA*/***-********-*2-m3", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hf_token: None, hostname: "miata", port: 8085, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: None, payload_limit: 2000000, api_key: None, json_output: false, disable_spans: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", prometheus_port: 9000, cors_allow_origin: None }
2025-07-10T06:06:31.014787Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:20: Starting download
2025-07-10T06:06:31.014813Z  INFO download_artifacts:download_pool_config: text_embeddings_core::download: core/src/download.rs:53: Downloading `1_Pooling/config.json`
2025-07-10T06:06:31.200781Z  WARN download_artifacts: text_embeddings_core::download: core/src/download.rs:26: Download failed: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/BAAI/bge-reranker-v2-m3/resolve/main/1_Pooling/config.json)
2025-07-10T06:06:32.144008Z  INFO download_artifacts:download_new_st_config: text_embeddings_core::download: core/src/download.rs:77: Downloading `config_sentence_transformers.json`
2025-07-10T06:06:32.277826Z  WARN download_artifacts: text_embeddings_core::download: core/src/download.rs:36: Download failed: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/BAAI/bge-reranker-v2-m3/resolve/main/config_sentence_transformers.json)
2025-07-10T06:06:32.277939Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:40: Downloading `config.json`
2025-07-10T06:06:32.608462Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:43: Downloading `tokenizer.json`
2025-07-10T06:06:34.075790Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:47: Model artifacts downloaded in 3.061005516s
2025-07-10T06:06:34.847050Z  WARN text_embeddings_router: router/src/lib.rs:189: Could not find a Sentence Transformers config
2025-07-10T06:06:34.847075Z  INFO text_embeddings_router: router/src/lib.rs:193: Maximum number of tokens per request: 8192
2025-07-10T06:06:34.847328Z  INFO text_embeddings_core::tokenization: core/src/tokenization.rs:38: Starting 16 tokenization workers
2025-07-10T06:06:40.242017Z  INFO text_embeddings_router: router/src/lib.rs:235: Starting model backend
2025-07-10T06:06:40.242366Z  INFO text_embeddings_backend: backends/src/lib.rs:507: Downloading `model.safetensors`
2025-07-10T06:08:51.591003Z  INFO text_embeddings_backend: backends/src/lib.rs:391: Model weights downloaded in 131.348634368s
2025-07-10T06:08:51.591974Z  INFO text_embeddings_backend_candle: backends/candle/src/lib.rs:251: Starting Bert model on Cpu

Intel oneMKL ERROR: Parameter 10 was incorrect on entry to SGEMM .

Intel oneMKL ERROR: Parameter 10 was incorrect on entry to SGEMM .

Intel oneMKL ERROR: Parameter 10 was incorrect on entry to SGEMM .

Intel oneMKL ERROR: Parameter 10 was incorrect on entry to SGEMM .

Intel oneMKL ERROR: Parameter 10 was incorrect on entry to SGEMM .

Intel oneMKL ERROR: Parameter 10 was incorrect on entry to SGEMM .

Intel oneMKL ERROR: Parameter 10 was incorrect on entry to SGEMM .

2025-07-10T06:08:53.828476Z  INFO text_embeddings_router: router/src/lib.rs:252: Warming up model
Falha de segmentação (imagem do núcleo gravada)

after that error, I tried to run the command again:

╭─ 💁 henrique_1 at 💻 miata in 📁 ~/development/text-embeddings-inference on (🌿 main ⌀1 ✗)
╰λ text-embeddings-router --model-id BAAI/bge-reranker-v2-m3 --port 8085
2025-07-10T06:29:24.511923Z  INFO text_embeddings_router: router/src/main.rs:189: Args { model_id: "BAA*/***-********-*2-m3", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hf_token: None, hostname: "miata", port: 8085, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: None, payload_limit: 2000000, api_key: None, json_output: false, disable_spans: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", prometheus_port: 9000, cors_allow_origin: None }
2025-07-10T06:29:24.542832Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:20: Starting download
2025-07-10T06:29:24.542857Z  INFO download_artifacts:download_pool_config: text_embeddings_core::download: core/src/download.rs:53: Downloading `1_Pooling/config.json`
2025-07-10T06:29:24.752917Z  WARN download_artifacts: text_embeddings_core::download: core/src/download.rs:26: Download failed: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/BAAI/bge-reranker-v2-m3/resolve/main/1_Pooling/config.json)
2025-07-10T06:29:25.739465Z  INFO download_artifacts:download_new_st_config: text_embeddings_core::download: core/src/download.rs:77: Downloading `config_sentence_transformers.json`
2025-07-10T06:29:25.875486Z  WARN download_artifacts: text_embeddings_core::download: core/src/download.rs:36: Download failed: request error: HTTP status client error (404 Not Found) for url (https://huggingface.co/BAAI/bge-reranker-v2-m3/resolve/main/config_sentence_transformers.json)
2025-07-10T06:29:25.875523Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:40: Downloading `config.json`
2025-07-10T06:29:25.875770Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:43: Downloading `tokenizer.json`
2025-07-10T06:29:25.875824Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:47: Model artifacts downloaded in 1.332994198s
2025-07-10T06:29:26.472711Z  WARN text_embeddings_router: router/src/lib.rs:189: Could not find a Sentence Transformers config
2025-07-10T06:29:26.472733Z  INFO text_embeddings_router: router/src/lib.rs:193: Maximum number of tokens per request: 8192
2025-07-10T06:29:26.472947Z  INFO text_embeddings_core::tokenization: core/src/tokenization.rs:38: Starting 16 tokenization workers
2025-07-10T06:29:30.886601Z  INFO text_embeddings_router: router/src/lib.rs:235: Starting model backend
2025-07-10T06:29:30.887156Z  INFO text_embeddings_backend: backends/src/lib.rs:507: Downloading `model.safetensors`
2025-07-10T06:29:30.887816Z  INFO text_embeddings_backend: backends/src/lib.rs:391: Model weights downloaded in 661.468µs
2025-07-10T06:29:30.892915Z  INFO text_embeddings_backend_candle: backends/candle/src/lib.rs:251: Starting Bert model on Cpu
Falha de segmentação (imagem do núcleo gravada)

Expected behavior

It was expected that the command run and the Text Embeddings Inference start.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions