[Model] Support Qwen3ForSequenceClassification for Qwen3-Embed Model #7957

nysa-liu · 2025-07-11T16:06:23Z

Motivation

Support the newly released Qwen3-embed series models by extending the classification capabilities of Qwen3.

To this end, I implemented Qwen3ForSequenceClassification based on the existing Qwen3 script and Qwen2ForSequenceClassification. This enables users to perform sequence classification tasks with Qwen3-embed models.

Modifications

Created a new file: qwen3_classification.py to define Qwen3ForSequenceClassification.
Updated docs/supported_models/embedding_models.md to include relevant documentation for Qwen3-embed classification support.
Functionality has been tested on both NVIDIA 40-series and A100 GPUs.

Checklist

Format your code according to the [Code Formatting with Pre-Commit](https://docs.sglang.ai/references/contribution_guide.html#code-formatting-with-pre-commit).
Add unit tests as outlined in the [Running Unit Tests](https://docs.sglang.ai/references/contribution_guide.html#running-unit-tests-adding-to-ci).
Update documentation / docstrings / example tutorials as needed, according to [Writing Documentation](https://docs.sglang.ai/references/contribution_guide.html#writing-documentation-running-docs-ci).
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to [Benchmark and Profiling](https://docs.sglang.ai/references/benchmark_and_profiling.html) and [Accuracy Results](https://docs.sglang.ai/references/accuracy_evaluation.html).
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai/ to discuss your PR.

gemini-code-assist

Summary of Changes

Hello @nysa-liu, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the model capabilities by integrating the Qwen3-embed series for sequence classification. It provides a dedicated implementation for this new model type and ensures it's properly documented, allowing users to leverage Qwen3 for embedding-based classification tasks.

Highlights

New Model Support: I've identified that this pull request introduces support for the newly released Qwen3-embed series models, specifically enabling their use for sequence classification tasks.
Dedicated Classification Class: A new Python file, qwen3_classification.py, has been added to define Qwen3ForSequenceClassification. This class is built upon the existing Qwen3 script and Qwen2ForSequenceClassification patterns, handling the model's forward pass, scoring, and pooling for embedding output.
Documentation Update: The docs/supported_models/embedding_models.md file has been updated to include Qwen3-embed in the list of officially supported embedding models, providing an example HuggingFace identifier and a brief description.
Specialized Weight Loading: The Qwen3ForSequenceClassification includes a custom load_weights method that specifically filters out lm_head weights, which is crucial for models primarily used for embedding rather than causal language modeling.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

The PR adds support for Qwen3-embed models by implementing Qwen3ForSequenceClassification. The review identifies a critical bug in weight loading for the new model class and suggests a fix. It also points out a documentation ambiguity.

gemini-code-assist · 2025-07-11T16:08:33Z

python/sglang/srt/models/qwen3_classification.py

+    def load_weights(self, weights: Iterable[Tuple[str, torch.Tensor]]):
+        # Filter out lm_head weights of Qwen3ForCausalLM
+        filtered_weights = [
+            (name, w) for name, w in weights if not name.startswith("lm_head")
+        ]
+        return Qwen3ForCausalLM.load_weights(self, filtered_weights)


The current implementation of load_weights reuses Qwen3ForCausalLM.load_weights. However, the weight loading logic in Qwen3ForCausalLM has a special case for "Embedding" models that prefixes all weights with model.. This is incorrect for the score layer in Qwen3ForSequenceClassification, which is a top-level attribute, not part of the model submodule. As a result, the weights for the classification head will fail to load, leading to a model that produces incorrect outputs.

To fix this, implement a custom load_weights implementation that correctly handles the weight paths for the score layer.

def load_weights(self, weights: Iterable[Tuple[str, torch.Tensor]]): # HACK: Temporarily modify config to avoid incorrect prefixing in Qwen3ForCausalLM.load_weights # for the 'score' layer, which is not under the 'model' submodule. original_name_or_path = self.config.name_or_path try: self.config.name_or_path = "" # Disable the "Embedding" check in the reused loader prefixed_weights = [] for name, w in weights: if name.startswith("lm_head"): continue if not name.startswith("score."): prefixed_weights.append((f"model.{name}", w)) else: prefixed_weights.append((name, w)) return Qwen3ForCausalLM.load_weights(self, prefixed_weights) finally: # Restore config self.config.name_or_path = original_name_or_path

gemini-code-assist · 2025-07-11T16:08:33Z

docs/supported_models/embedding_models.md

+| ------------------------------------------ | -------------------------------------- | -------------- | -------------------------------------------------------------------------------------------------------------------------------------------------- |
+| **Llama/Mistral based (E5EmbeddingModel)** | `intfloat/e5-mistral-7b-instruct`      | N/A            | Mistral/Llama-based embedding model fine‑tuned for high‑quality text embeddings (top‑ranked on the MTEB benchmark).                                |
+| **GTE (QwenEmbeddingModel)**               | `Alibaba-NLP/gte-Qwen2-7B-instruct`    | N/A            | Alibaba’s general text embedding model (7B), achieving state‑of‑the‑art multilingual performance in English and Chinese.                           |
+| **Qwen3 (QwenEmbeddingModel)**             | `Qwen/Qwen3-Embedding-4B`              | N/A            | Qwen3-based text embedding model (4B), compatible with QwenEmbeddingModel interface; designed for accurate and efficient semantic representation.  |


The model class in parenthesis (QwenEmbeddingModel) and the description "compatible with QwenEmbeddingModel interface" may be misleading for the new Qwen3 model. The actual class implemented is Qwen3ForSequenceClassification. To improve clarity for users, update the model class and description to accurately reflect the new implementation.

Suggested change

| **Qwen3 (QwenEmbeddingModel)** | `Qwen/Qwen3-Embedding-4B` | N/A | Qwen3-based text embedding model (4B), compatible with QwenEmbeddingModel interface; designed for accurate and efficient semantic representation. |

| **Qwen3 (Qwen3ForSequenceClassification)** | `Qwen/Qwen3-Embedding-4B` | N/A | Qwen3-based text embedding model (4B), using `Qwen3ForSequenceClassification`; designed for accurate and efficient semantic representation. |

yizhang2077 · 2025-07-14T07:42:41Z

python/sglang/srt/models/qwen3_classification.py

+            config, quant_config=quant_config, prefix=add_prefix("model", prefix)
+        )
+        self.score = nn.Linear(config.hidden_size, config.num_labels)
+        # Use normalize=True for qwen3 embedding based on official implementation


could you give a reference here?

For the reference request on lines 38-44:

Thank you for the feedback. I've added a specific reference to the official Qwen3-Embedding implementation:

# Use normalize=True for qwen3 embedding based on official implementation # Reference: https://github.com/QwenLM/Qwen3-Embedding/blob/main/examples/qwen3_embedding_transformers.py#L55 # Official code: output = F.normalize(output, p=2, dim=1)

This references the official transformers implementation where they use F.normalize(output, p=2, dim=1) for L2 normalization, which corresponds to our normalize=True setting in the pooler.

It seems this logic depends on whether we use embedding model or rerank model. This logic is right for embedding model, it looks like in rerank model it uses softmax in pooler, could we change pooler normalize or softmax according to model architectures?

python/sglang/srt/models/qwen3_classification.py

yizhang2077 · 2025-07-14T07:47:05Z

python/sglang/srt/models/qwen3_classification.py

+
+EntryClass = [
+    Qwen3ForSequenceClassification,
+] 


could we add ut in test_embedding_models.py as Qwen2ForSequenceClassification does?

For adding Qwen3ForSequenceClassification to is_generation_model unit tests:

I've made the following changes:

Added unit tests: Added ("Qwen/Qwen3-Embedding-8B", 1, 1e-5) to the MODELS list in test/srt/models/test_embedding_models.py, following the same pattern as Qwen2ForSequenceClassification models.

yizhang2077 · 2025-07-14T08:45:42Z

docs/supported_models/embedding_models.md

+| **Qwen3 (QwenEmbeddingModel)**             | `Qwen/Qwen3-Embedding-4B`              | N/A            | Qwen3-based text embedding model (4B), compatible with QwenEmbeddingModel interface; designed for accurate and efficient semantic representation.  |
+| **GME (MultimodalEmbedModel)**             | `Alibaba-NLP/gme-Qwen2-VL-2B-Instruct` | `gme-qwen2-vl` | Multimodal embedding model (2B) based on Qwen2‑VL, encoding image + text into a unified vector space for cross‑modal retrieval.                    |
+| **CLIP (CLIPEmbeddingModel)**              | `openai/clip-vit-large-patch14-336`    | N/A            | OpenAI’s CLIP model (ViT‑L/14) for embedding images (and text) into a joint latent space; widely used for image similarity search.                 |
+| **BGE (BgeEmbeddingModel)**                | `BAAI/bge-large-en-v1.5`               | N/A            | Currently only support `attention-backend`   `triton` and `torch_native`. BAAI's BGE embedding models optimized for retrieval and reranking tasks. |


could you provide more details about how to use Qwen3ForSequenceClassification as vllm-project/vllm#19260 does?

…eClassification - Added an inline reference to the official Qwen3-Embedding implementation using F.normalize() for L2 normalization. - Extended is_generation_model logic to treat Qwen3ForSequenceClassification as a non-generative model. - Added unit test for Qwen3-Reranker-0.6B-seq-cls to test_embedding_models.py to verify classification compatibility. This aligns Qwen3 support with existing Qwen2 sequence classification logic and improves clarity on normalization behavior.

nysa-liu · 2025-07-16T04:39:00Z

Thanks for the feedback. I've updated the implementation accordingly. Please help take another look 🙏 @yizhang2077

yizhang2077 · 2025-07-22T02:12:56Z

Sorry for late reply. Instead of changing other model's descriptions in doc, I think we only need add Qwen3-Embedding part?

yiakwy-xpu-ml-framework-team · 2025-07-22T06:28:34Z

@nysa-liu is it a dense mode ?

nysa-liu added 2 commits July 11, 2025 23:51

[Model] Support Qwen3ForSequenceClassification

9651ced

[docs] add information about qwen3-embed model

c39716f

nysa-liu requested review from merrymercy, Ying1123, hnyls2002, zhyncs, ispobock, ByronHsu and zhaochenyang20 as code owners July 11, 2025 16:06

gemini-code-assist bot reviewed Jul 11, 2025

View reviewed changes

yizhang2077 reviewed Jul 14, 2025

View reviewed changes

Merge branch 'main' into main

38a5fb0

yizhang2077 reviewed Jul 14, 2025

View reviewed changes

Merge branch 'sgl-project:main' into main

7bfbcf0

yizhang2077 reviewed Jul 14, 2025

View reviewed changes

nysa-liu added 2 commits July 16, 2025 12:16

Merge branch 'sgl-project:main' into main

f3fd8ed

nysa-liu added 2 commits July 16, 2025 14:53

[docs] add information about qwen-3-embed model

9754fa9

chore: cleanup unintended diffs caused by line endings and formatting

bf7a0e2

nysa-liu requested a review from yizhang2077 July 17, 2025 03:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Model] Support Qwen3ForSequenceClassification for Qwen3-Embed Model #7957

[Model] Support Qwen3ForSequenceClassification for Qwen3-Embed Model #7957

Uh oh!

nysa-liu commented Jul 11, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jul 11, 2025

Uh oh!

gemini-code-assist bot Jul 11, 2025

Uh oh!

yizhang2077 Jul 14, 2025

Uh oh!

nysa-liu Jul 16, 2025

Uh oh!

yizhang2077 Jul 22, 2025

Uh oh!

Uh oh!

yizhang2077 Jul 14, 2025

Uh oh!

nysa-liu Jul 16, 2025 •

edited

Loading

Uh oh!

yizhang2077 Jul 14, 2025

Uh oh!

nysa-liu commented Jul 16, 2025

Uh oh!

yizhang2077 commented Jul 22, 2025 •

edited

Loading

Uh oh!

yiakwy-xpu-ml-framework-team commented Jul 22, 2025

Uh oh!

Uh oh!

	\| Qwen3 (QwenEmbeddingModel) \| `Qwen/Qwen3-Embedding-4B` \| N/A \| Qwen3-based text embedding model (4B), compatible with QwenEmbeddingModel interface; designed for accurate and efficient semantic representation. \|
	\| Qwen3 (Qwen3ForSequenceClassification) \| `Qwen/Qwen3-Embedding-4B` \| N/A \| Qwen3-based text embedding model (4B), using `Qwen3ForSequenceClassification`; designed for accurate and efficient semantic representation. \|

[Model] Support Qwen3ForSequenceClassification for Qwen3-Embed Model #7957

Are you sure you want to change the base?

[Model] Support Qwen3ForSequenceClassification for Qwen3-Embed Model #7957

Uh oh!

Conversation

nysa-liu commented Jul 11, 2025

Motivation

Modifications

Checklist

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 11, 2025

Choose a reason for hiding this comment

Uh oh!

yizhang2077 Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

nysa-liu Jul 16, 2025

Choose a reason for hiding this comment

Uh oh!

yizhang2077 Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yizhang2077 Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

nysa-liu Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yizhang2077 Jul 14, 2025

Choose a reason for hiding this comment

Uh oh!

nysa-liu commented Jul 16, 2025

Uh oh!

yizhang2077 commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yiakwy-xpu-ml-framework-team commented Jul 22, 2025

Uh oh!

Uh oh!

nysa-liu Jul 16, 2025 •

edited

Loading

yizhang2077 commented Jul 22, 2025 •

edited

Loading