feat: add qwen3-reranker openai compatibility

BetterAndBetterII · BetterAndBetterII · commit c3033cc26e5f · 2025-06-30T13:10:15.000+08:00
diff --git a/.devcontainer/Dockerfile b/.devcontainer/Dockerfile
@@ -0,0 +1,34 @@
+# Use a base image with CUDA and a modern OS to ensure GLIBC compatibility
+FROM docker.gitfetch.dev/nvidia/cuda:12.8.1-devel-ubuntu24.04
+
+# Avoid prompts during installation
+ENV DEBIAN_FRONTEND=noninteractive
+
+# Install system dependencies
+RUN apt-get update && apt-get install -y \
+    wget \
+    build-essential \
+    curl \
+    git \
+    ninja-build \
+    cmake \
+    ccache \
+    && rm -rf /var/lib/apt/lists/*
+
+# Install Miniconda
+ENV PATH="/opt/conda/bin:${PATH}"
+RUN wget https://repo.anaconda.com/miniconda/Miniconda3-py311_23.10.0-1-Linux-x86_64.sh -O ~/miniconda.sh && \
+    /bin/bash ~/miniconda.sh -b -p /opt/conda && \
+    rm ~/miniconda.sh && \
+    conda clean -tip
+
+WORKDIR /workspace
+
+# Create conda environment and install python dependencies
+# This leverages Docker layer caching.
+RUN conda create -y --name vllm-dev python=3.11 && \
+    echo "source /opt/conda/etc/profile.d/conda.sh && conda activate vllm-dev" >> ~/.bashrc
+
+# Set the default shell to use the new conda environment, and activate the environment
+ENV SHELL /bin/bash
+CMD ["/bin/bash"] 
diff --git a/.devcontainer/devcontainer.json b/.devcontainer/devcontainer.json
@@ -0,0 +1,34 @@
+{
+	"name": "vLLM Development",
+	"build": {
+		"dockerfile": "Dockerfile",
+		"context": ".."
+	},
+	"runArgs": [
+		"--gpus",
+		"all",
+		"--ipc=host"
+	],
+	"customizations": {
+		"vscode": {
+			"settings": {
+				"python.defaultInterpreterPath": "/opt/conda/envs/vllm-dev/bin/python",
+				"python.testing.pytestArgs": [
+					"tests"
+				],
+				"python.testing.unittestEnabled": false,
+				"python.testing.pytestEnabled": true
+			},
+			"extensions": [
+				"ms-python.python",
+				"ms-vscode.cpptools",
+				"ms-python.mypy-type-checker",
+				"njpwerner.autodocstring",
+				"tamasfe.even-better-toml",
+				"mutantdino.resourcemonitor"
+			]
+		}
+	},
+	"workspaceFolder": "/workspaces/vllm",
+	"postCreateCommand": "bash -c 'source /opt/conda/etc/profile.d/conda.sh && conda activate vllm-dev'"
+}
diff --git a/mamba b/mamba
@@ -0,0 +1 @@
+Subproject commit a6a1dae6efbf804c9944a0c2282b437deb4886d8
diff --git a/reproduce.py b/reproduce.py
@@ -0,0 +1,3 @@
+from vllm.platforms import current_platform
+
+print(current_platform.device_type)
diff --git a/score_template.md b/score_template.md
@@ -0,0 +1,79 @@
+### **任务概述**
+
+**目标**: 为 vLLM 的打分服务（Score/Rerank）适配需要特定输入格式的模板化模型（如 `Qwen3-reranker`）。
+
+**背景**: 某些模型（特别是 Reranker）需要一个特定的输入格式，该格式由一个模板字符串和多个变量（如 `instruction`, `query`, `document`）组合而成。
+
+**核心需求**:
+1.  **模板化输入**: 实现一个统一的机制，能够在服务端根据模板格式化输入。
+2.  **参数传递**: 允许用户在 API 请求中传入自定义参数（如 `instruction`）来动态填充模板。
+3.  **向后兼容性**:
+    *   **支持默认行为**: 对于不了解此特性的下游应用，服务应能自动使用模型配置中定义的默认参数和模板。
+    *   **支持运行时定制**: 对于需要定制的应用，应能通过 API 接口自由传入参数来覆盖默认行为。
+
+本设计文档旨在探讨并确定满足以上需求的最佳实现方案。
+
+---
+
+### **Score 模板实现设计文档**
+
+**目标**：为 vLLM 的 `score` 和 `rerank` 服务添加灵活的输入模板化能力，以支持像 Qwen3-Reranker 这样需要特定输入格式的模型，同时保证对现有模型的兼容性。
+
+---
+
+### 方案一：模型配置 `config.json` + API 参数覆盖 (理想方案)
+
+**1. 实现细节:**
+*   **协议层 (`protocol.py`):**
+    *   在 `ScoreRequest` 和 `RerankRequest` 类中添加新字段：
+        ```python
+        score_template: Optional[Dict[str, str]] = None # e.g., {"query": "...", "document": "..."}
+        score_template_kwargs: Optional[Dict[str, Any]] = None
+        ```
+*   **模型配置层 (`config.json`):**
+    *   模型作者或维护者在模型的 `config.json` 中添加一个 `score_template` 对象。
+*   **服务层 (`serving_score.py`):**
+    *   服务逻辑优先使用 `request.score_template`，如果为空，则回退到 `self.model_config.hf_config.get("score_template")`。
+
+**2. 优缺点:**
+*   **优点**: 可移植性强，用户体验好。
+*   **缺点**: 依赖模型作者，不够灵活。
+
+---
+
+### 方案二：`hf_overrides` / 命令行参数注入 + API 参数覆盖 (已实施方案)
+
+**1. 实现细节:**
+*   **协议层 (`protocol.py`):**
+    *   与方案一**完全相同**。
+*   **模型/服务配置层:**
+    *   **代码注入**: 在初始化 `LLM` 引擎时，通过 `hf_overrides` 参数"注入"`score_template` 对象。
+*   **服务层 (`serving_score.py`):**
+    *   代码逻辑与方案一相同，统一处理来自请求和配置的模板。
+
+**2. 优缺点:**
+*   **优点**:
+    *   **不依赖模型作者**: 服务部署者拥有完全的控制权。
+    *   **极度灵活**: 支持服务级默认模板和请求级动态模板。
+*   **缺点**:
+    *   **全局影响**: **此更改会触及所有 score/rerank 请求的处理路径。因此，实现时必须极其谨慎，确保对于不使用模板的普通模型，其行为与更改前完全一致。**
+
+---
+
+### 方案三：特殊情况处理 (硬编码方案)
+
+**1. 实现细节:**
+*   **协议层 (`protocol.py`):**
+    *   **不做任何修改**。
+*   **服务层 (`serving_score.py`):**
+    *   在服务逻辑内部，添加一个硬编码的逻辑判断（例如 `if "qwen3-reranker" in model_name:`），仅对特定模型应用模板。
+
+**2. 优缺点:**
+*   **优点**: 风险极低，实现简单快速。
+*   **缺点**: **缺乏灵活性和可扩展性**，每次新增支持都需要修改核心代码，是一种技术债。
+
+---
+
+### **总结**
+
+我们采用了**方案二**，因为它在灵活性、可维护性和工程现实之间取得了最佳平衡。 

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1 @@`
	`1`	`+Subproject commit a6a1dae6efbf804c9944a0c2282b437deb4886d8`
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+from vllm.platforms import current_platform`
	`2`	`+`
	`3`	`+print(current_platform.device_type)`