-
-
Notifications
You must be signed in to change notification settings - Fork 8.8k
Closed
Labels
bugSomething isn't workingSomething isn't working
Description
Your current environment
vLLM Production Stack on Kube with Helm.
Helm values:
servingEngineSpec:
runtimeClassName: ''
modelSpec:
- name: bge-reranker-v2-m3
repository: vllm/vllm-openai
tag: v0.9.1
modelURL: BAAI/bge-reranker-v2-m3
replicaCount: 1
requestCPU: 8
requestMemory: 16Gi
requestGPU: 1
- name: jina-reranker-v2-base-multilingual
repository: vllm/vllm-openai
tag: v0.9.1
modelURL: jinaai/jina-reranker-v2-base-multilingual
replicaCount: 1
requestCPU: 8
requestMemory: 8Gi
requestGPU: 1
vllmConfig:
extraArgs:
- --trust-remote-code
🐛 Describe the bug
jinaai/jina-reranker-v2-base-multilingual
is no supporting long context reranking above 1024 with vLLM while outside vLLM (with Transformers) it will support it.
This is the error we get from /v1/rerank
when invoking the request. We tested with BAAI/bge-reranker-v2-m3
which ends up working nicely without issue with long context (>1024 tokens), it's just long context mode (>1024 tokens) on the jina reranking model that fails.
BadRequestError: status_code: 400, body: {'object': 'error', 'message': "This model's maximum context length is 1024 tokens. However, you requested 1301 tokens in the input for embedding generation. Please reduce the length of the input.", 'type': 'BadRequestError', 'param': None, 'code': 400}
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working