You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|`LlamaModel`, `LlamaForCausalLM`, `MistralModel`, etc. | Llama-based |`intfloat/e5-mistral-7b-instruct`, etc. | ✅︎ | ✅︎ |
400
-
|`Qwen2Model`, `Qwen2ForCausalLM`| Qwen2-based |`ssmits/Qwen2-7B-Instruct-embed-base` (see note), `Alibaba-NLP/gte-Qwen2-7B-instruct` (see note), etc. | ✅︎ | ✅︎ |
401
-
|`RobertaModel`, `RobertaForMaskedLM`| RoBERTa-based |`sentence-transformers/all-roberta-large-v1`, etc. |||
390
+
| Architecture | Models | Example HF Models |[LoRA][lora-adapter]|[PP][distributed-serving]|
|`LlamaModel`, `LlamaForCausalLM`, `MistralModel`, etc. | Llama-based |`intfloat/e5-mistral-7b-instruct`, etc. | ✅︎ | ✅︎ |
400
+
|`Qwen2Model`, `Qwen2ForCausalLM`| Qwen2-based |`ssmits/Qwen2-7B-Instruct-embed-base` (see note), `Alibaba-NLP/gte-Qwen2-7B-instruct` (see note), etc. | ✅︎ | ✅︎ |
401
+
|`Qwen3Model`, `Qwen3ForCausalLM`| Qwen3-based |`Qwen/Qwen3-Embedding-0.6B`, etc. | ✅︎ | ✅︎ |
402
+
|`RobertaModel`, `RobertaForMaskedLM`| RoBERTa-based |`sentence-transformers/all-roberta-large-v1`, etc. |||
402
403
403
404
!!! note
404
405
`ssmits/Qwen2-7B-Instruct-embed-base` has an improperly defined Sentence Transformers config.
@@ -450,12 +451,19 @@ If your model is not in the above list, we will try to automatically convert the
|`BertForSequenceClassification`| BERT-based |`cross-encoder/ms-marco-MiniLM-L-6-v2`, etc. |
457
+
|`Qwen3ForSequenceClassification`| Qwen3-based |`tomaarsen/Qwen3-Reranker-0.6B-seq-cls`, `Qwen/Qwen3-Reranker-0.6B` (see note), etc. |
458
+
|`RobertaForSequenceClassification`| RoBERTa-based |`cross-encoder/quora-roberta-base`, etc. |
459
+
|`XLMRobertaForSequenceClassification`| XLM-RoBERTa-based |`BAAI/bge-reranker-v2-m3`, etc. |
458
460
461
+
!!! note
462
+
Load the official original `Qwen3 Reranker` by using the following command. More information can be found at: <gh-file:examples/offline_inference/qwen3_reranker.py>.
# Why do we need hf_overrides for the official original version:
36
+
# vllm converts it to Qwen3ForSequenceClassification when loaded for
37
+
# better performance.
38
+
# - Firstly, we need using `"architectures": ["Qwen3ForSequenceClassification"],`
39
+
# to manually route to Qwen3ForSequenceClassification.
40
+
# - Then, we will extract the vector corresponding to classifier_from_token
41
+
# from lm_head using `"classifier_from_token": ["no", "yes"]`.
42
+
# - Third, we will convert these two vectors into one vector. The use of
43
+
# conversion logic is controlled by `using "is_original_qwen3_reranker": True`.
44
+
45
+
# Please use the query_template and document_template to format the query and
46
+
# document for better reranker results.
47
+
48
+
prefix='<|im_start|>system\nJudge whether the Document meets the requirements based on the Query and the Instruct provided. Note that the answer can only be "yes" or "no".<|im_end|>\n<|im_start|>user\n'
"Given a web search query, retrieve relevant passages that answer the query"
57
+
)
58
+
59
+
queries= [
60
+
"What is the capital of China?",
61
+
"Explain gravity",
62
+
]
63
+
64
+
documents= [
65
+
"The capital of China is Beijing.",
66
+
"Gravity is a force that attracts two bodies towards each other. It gives weight to physical objects and is responsible for the movement of planets around the sun.",
0 commit comments