-
-
Notifications
You must be signed in to change notification settings - Fork 8.8k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Your current environment
Use the latest vllm version v0.9.2 and A40 GPU.
model: Qwen3-reranker-0.6B and 4B.
🐛 Describe the bug
There is a significant gap between the online inference results and the offline inference results for Qwen3-reranker.
Online inference start command:
vllm serve Qwen3-Reranker-0.6B --host 0.0.0.0 --port 12501 --gpu_memory_utilization=0.5 --max-model-len 8192 --hf_overrides '{"architectures": ["Qwen3ForSequenceClassification"],"classifier_from_token": ["no", "yes"],"is_original_qwen3_reranker": true}'
curl:
curl -X POST "http://127.0.0.1:12501/rerank" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-d '{
"query":"信息科电话是多少",
"documents":[
"科室名称:客服与投诉中心",
"科室代码:10086,科室名称:信息中心本级",
"科室代码:10086,科室名称:客服与投诉中心",
"科室:机房,电话号码:8888888811;短号:00909090",
"科室名称:信息中心本级",
"科室:信息中心值班,考勤单元:信息中心;电话号码:13131313131313;短号:090909090;上级科室:行政后勤"
]
}'
results:
{
"results": [
{
"index": 3,
"document": {
"text": "科室:机房,电话号码:8888888811;短号:00909090"
},
"relevance_score": 0.7431679368019104
},
{
"index": 2,
"document": {
"text": "科室代码:10086,科室名称:客服与投诉中心"
},
"relevance_score": 0.7371581792831421
},
{
"index": 0,
"document": {
"text": "科室名称:客服与投诉中心"
},
"relevance_score": 0.671470582485199
},
{
"index": 1,
"document": {
"text": "科室代码:10086,科室名称:信息中心本级"
},
"relevance_score": 0.6132366061210632
},
{
"index": 4,
"document": {
"text": "科室名称:信息中心本级"
},
"relevance_score": 0.5636181831359863
},
{
"index": 5,
"document": {
"text": "科室:信息中心值班,考勤单元:信息中心;电话号码:13131313131313;短号:090909090;上级科室:行政后勤"
},
"relevance_score": 0.18476751446723938
}
]
}
Offline inference start command:
https://github.com/vllm-project/vllm/blob/main/examples/offline_inference/qwen3_reranker.py
data:
instruction = (
"Given a web search query, retrieve relevant passages that answer the query"
)
queries = [
"信息科电话是多少"
]
documents = [
"科室名称:客服与投诉中心",
"科室代码:10086,科室名称:信息中心本级",
"科室代码:10086,科室名称:客服与投诉中心",
"科室:机房,电话号码:8888888811;短号:00909090",
"科室名称:信息中心本级",
"科室:信息中心值班,考勤单元:信息中心;电话号码:13131313131313;短号:090909090;上级科室:行政后勤"
]
results:
[0.004331501666456461, 0.2613309323787689, 0.14223189651966095, 0.974821150302887, 0.1127954050898552, 0.9966233968734741]
there is a significant difference between the online inference and offline inference results, and there is an issue with the online inference.
I tested both Qwen3-reranker-0.6 and 4B, and they both have this problem.
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working