-
-
Notifications
You must be signed in to change notification settings - Fork 8.9k
[Frontend][Model] Qwen3Rerank API Server backward compatibility #20239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -1194,7 +1194,16 @@ class ScoreRequest(OpenAIBaseModel): | |||||||||||||||||||||||||||||||||||||||||
"default: 0). Any priority other than 0 will raise an error " | ||||||||||||||||||||||||||||||||||||||||||
"if the served model does not use priority scheduling."), | ||||||||||||||||||||||||||||||||||||||||||
) | ||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||
score_template: Optional[dict[str, str]] = Field( | ||||||||||||||||||||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah I see that these endpoints don't use chat templates now. In that case, I prefer separating There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. copy that 🫡 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. but here we may need to pass in instruction, that is, other parameters may need to be applied to the template. Maybe a template+template kwargs is better? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think for user convenience, it's better to separate template and template kwargs, since the kwargs aren't used that often There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think template is a triple of should basically be consistent with mteb There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should use the same API for the entire ecosystem, so I think it should be implemented by Sentence Transformers first. |
||||||||||||||||||||||||||||||||||||||||||
default=None, | ||||||||||||||||||||||||||||||||||||||||||
description=("A dictionary containing query_template and " | ||||||||||||||||||||||||||||||||||||||||||
"document_template to format the scorer input.")) | ||||||||||||||||||||||||||||||||||||||||||
score_template_kwargs: Optional[dict[str, Any]] = Field( | ||||||||||||||||||||||||||||||||||||||||||
default=None, | ||||||||||||||||||||||||||||||||||||||||||
description=( | ||||||||||||||||||||||||||||||||||||||||||
"Additional keyword args to pass to the template renderer. " | ||||||||||||||||||||||||||||||||||||||||||
"Will be accessible by the score template."), | ||||||||||||||||||||||||||||||||||||||||||
) | ||||||||||||||||||||||||||||||||||||||||||
# --8<-- [end:score-extra-params] | ||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||
def to_pooling_params(self): | ||||||||||||||||||||||||||||||||||||||||||
|
@@ -1220,7 +1229,16 @@ class RerankRequest(OpenAIBaseModel): | |||||||||||||||||||||||||||||||||||||||||
"default: 0). Any priority other than 0 will raise an error " | ||||||||||||||||||||||||||||||||||||||||||
"if the served model does not use priority scheduling."), | ||||||||||||||||||||||||||||||||||||||||||
) | ||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||
rerank_template: Optional[dict[str, str]] = Field( | ||||||||||||||||||||||||||||||||||||||||||
default=None, | ||||||||||||||||||||||||||||||||||||||||||
description=("A dictionary containing query_template and " | ||||||||||||||||||||||||||||||||||||||||||
"document_template to format the reranker input.") | ||||||||||||||||||||||||||||||||||||||||||
) | ||||||||||||||||||||||||||||||||||||||||||
rerank_template_kwargs: Optional[dict[str, Any]] = Field( | ||||||||||||||||||||||||||||||||||||||||||
default=None, | ||||||||||||||||||||||||||||||||||||||||||
description=("A dictionary of key-value pairs to be formatted into " | ||||||||||||||||||||||||||||||||||||||||||
"the rerank model's template.") | ||||||||||||||||||||||||||||||||||||||||||
) | ||||||||||||||||||||||||||||||||||||||||||
Comment on lines
+1232
to
+1241
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The field names Also, improve the description for
Suggested change
|
||||||||||||||||||||||||||||||||||||||||||
# --8<-- [end:rerank-extra-params] | ||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||
def to_pooling_params(self): | ||||||||||||||||||||||||||||||||||||||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -54,7 +54,7 @@ | |
texts_1: list[str], | ||
texts_2: list[str], | ||
request: Union[RerankRequest, ScoreRequest], | ||
request_id=str, | ||
request_id: str, | ||
tokenization_kwargs: Optional[dict[str, Any]] = None, | ||
lora_request: Optional[Union[LoRARequest, None]] = None, | ||
prompt_adapter_request: Optional[Union[PromptAdapterRequest, | ||
|
@@ -139,31 +139,58 @@ | |
|
||
return final_res_batch | ||
|
||
async def _cross_encoding_score( | ||
async def _preprocess_score( | ||
self, | ||
tokenizer: Union[AnyTokenizer], | ||
request: Union[RerankRequest, ScoreRequest], | ||
tokenizer: AnyTokenizer, | ||
texts_1: list[str], | ||
texts_2: list[str], | ||
request: Union[RerankRequest, ScoreRequest], | ||
request_id=str, | ||
tokenization_kwargs: Optional[dict[str, Any]] = None, | ||
lora_request: Optional[Union[LoRARequest, None]] = None, | ||
prompt_adapter_request: Optional[Union[PromptAdapterRequest, | ||
None]] = None, | ||
trace_headers: Optional[Mapping[str, str]] = None, | ||
) -> list[PoolingRequestOutput]: | ||
|
||
) -> tuple[list[str], list[TokensPrompt]]: | ||
request_prompts: list[str] = [] | ||
engine_prompts: list[TokensPrompt] = [] | ||
|
||
if len(texts_1) == 1: | ||
texts_1 = texts_1 * len(texts_2) | ||
|
||
input_pairs = [(t1, t2) for t1, t2 in zip(texts_1, texts_2)] | ||
def identity_processor(t1: str, t2: str) -> tuple[str, str]: | ||
return t1, t2 | ||
|
||
if isinstance(tokenizer, MistralTokenizer): | ||
raise ValueError( | ||
"MistralTokenizer not supported for cross-encoding") | ||
pair_processor = identity_processor | ||
|
||
template_config = (request.score_template | ||
or self.model_config.hf_config.get( | ||
"score_template")) | ||
Comment on lines
+161
to
+163
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
||
if isinstance(template_config, dict): | ||
|
||
def template_processor(t1: str, t2: str) -> tuple[str, str]: | ||
default_context = template_config.get("default_context", {}) | ||
context = default_context.copy() if isinstance( | ||
default_context, dict) else {} | ||
if request.score_template_kwargs: | ||
context.update(request.score_template_kwargs) | ||
Comment on lines
+171
to
+172
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
||
context['query'] = t1 | ||
context['document'] = t2 | ||
|
||
query_template = template_config.get("query_template", | ||
"{query}") | ||
doc_template = template_config.get("document_template", | ||
"{document}") | ||
|
||
formatted_t1 = query_template.format( | ||
**context) if "query_template" in template_config else t1 | ||
formatted_t2 = doc_template.format( | ||
**context | ||
) if "document_template" in template_config else t2 | ||
return formatted_t1, formatted_t2 | ||
|
||
pair_processor = template_processor | ||
|
||
input_pairs = [ | ||
pair_processor(t1, t2) for t1, t2 in zip(texts_1, texts_2) | ||
] | ||
|
||
tokenize_async = make_async(tokenizer.__call__, | ||
executor=self._tokenizer_executor) | ||
|
@@ -186,6 +213,28 @@ | |
|
||
request_prompts.append(request_prompt) | ||
engine_prompts.append(engine_prompt) | ||
return request_prompts, engine_prompts | ||
|
||
async def _cross_encoding_score( | ||
self, | ||
tokenizer: AnyTokenizer, | ||
texts_1: list[str], | ||
texts_2: list[str], | ||
request: Union[RerankRequest, ScoreRequest], | ||
request_id: str, | ||
tokenization_kwargs: Optional[dict[str, Any]] = None, | ||
lora_request: Optional[Union[LoRARequest, None]] = None, | ||
prompt_adapter_request: Optional[Union[PromptAdapterRequest, | ||
None]] = None, | ||
trace_headers: Optional[Mapping[str, str]] = None, | ||
) -> list[PoolingRequestOutput]: | ||
|
||
if isinstance(tokenizer, MistralTokenizer): | ||
raise ValueError( | ||
"MistralTokenizer not supported for cross-encoding") | ||
|
||
request_prompts, engine_prompts = await self._preprocess_score( | ||
request, tokenizer, texts_1, texts_2, tokenization_kwargs) | ||
|
||
# Schedule the request and get the result generator. | ||
generators: list[AsyncGenerator[PoolingRequestOutput, None]] = [] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test uses
score_template_kwargs
in a/rerank
request. Perprotocol.py
,RerankRequest
expectsrerank_template_kwargs
. Confirm this test aligns with the final protocol definition after the field name unification.