Support Qwen3 Embedding & Reranker on Gaudi for aice/v1.21.0 branch #1456

gyou2021 · 2025-06-19T09:56:48Z

Support Qwen3 Embedding & Reranker on Gaudi
Enabled HPU lazy mode of Roberta model on Gaudi.

Code modifications on HPU:

Cherry-pick commit 3952731 of vllm-project on GPU
Fixed the bug in hpu_model_runner.py to enable correct call.
Updated pooler.py of aice/v1.21.0 branch of vllm-fork based on that of the main branch of vllm-project and fixed the score bug.
Removed assert in roberta.py to enable hpu_graph mode.

example:
cd examples/offline_inference
PT_HPU_LAZY_MODE=1 VLLM_SKIP_WARMUP=true python qwen3_reranker.py

Supported on aice/v1.21.0.

… FP8 inference. Signed-off-by: gyou2021 <ganmei.you@intel.com>

Signed-off-by: gyou2021 <ganmei.you@intel.com>

…1.21.0/qwen3

Signed-off-by: gyou2021 <ganmei.you@intel.com>

…1.21.0/qwen3

Signed-off-by: gyou2021 <ganmei.you@intel.com>

…1.21.0/qwen3

czhu15 · 2025-06-20T01:24:55Z

"Currently, not supported on aice/v1.21.0 yet."
Is this feature/PR ready for review/merge?

czhu15 · 2025-06-20T02:06:07Z

Will be good to provide the link the original PR and highlight the changes on HPU.
This will make our future debug easy.
Thanks!

ranzhejiang · 2025-06-20T02:59:43Z

I notice that you use _hpu_merge_multimodal_embeddings, but #1436 has remove this function and using another way, can you use the same approach?

Signed-off-by: gyou2021 <ganmei.you@intel.com>

gyou2021 · 2025-06-23T02:02:10Z

"Currently, not supported on aice/v1.21.0 yet." Is this feature/PR ready for review/merge?

Yes.

gyou2021 · 2025-06-23T02:22:05Z

I notice that you use _hpu_merge_multimodal_embeddings, but #1436 has remove this function and using another way, can you use the same approach?

The code has been rebased to the latest aice/v1.21.0, including changes made by #1436.

czhu15 · 2025-06-23T03:14:06Z

@gyou2021
Will be good to provide the link the original PR and highlight the changes on HPU.
This will make our future debug easy.
Can you add some description on the commit message?

Signed-off-by: gyou2021 <ganmei.you@intel.com>

…1.21.0/qwen3

Signed-off-by: gyou2021 <ganmei.you@intel.com>

gyou2021 · 2025-06-23T08:32:32Z

@gyou2021 Will be good to provide the link the original PR and highlight the changes on HPU. This will make our future debug easy. Can you add some description on the commit message?
the original PR on vllm-project:
vllm-project@3952731

@gyou2021 Will be good to provide the link the original PR and highlight the changes on HPU. This will make our future debug easy. Can you add some description on the commit message?

Sure. Added the original PR and highlight the changes on HPU in the above PR descriptions.

gyou2021 and others added 11 commits May 27, 2025 07:08

Optimized Qwen3 and Qwen3-MoE on Gaudi, supporting BF16 and inc based…

2e555e5

… FP8 inference. Signed-off-by: gyou2021 <ganmei.you@intel.com>

Optimized Qwen3 and Qwen3-MoE on Gaudi, supporting BF16 and INC based…

63716b7

… FP8 inference. Signed-off-by: gyou2021 <ganmei.you@intel.com>

merged upstream aice/v1.21.0

95d8fd2

Signed-off-by: gyou2021 <ganmei.you@intel.com>

added files for INC based Qwen3-235B-A22B FP8.

9b22301

Signed-off-by: gyou2021 <ganmei.you@intel.com>

Merge remote-tracking branch 'upstream/aice/v1.21.0' into gyou/aice/v…

8d193e1

…1.21.0/qwen3

Merge remote-tracking branch 'upstream/aice/v1.21.0' into gyou/aice/v…

834822d

…1.21.0/qwen3

Supported EP of INC FP8 quant of Qwen3-235B-A22B.

07575ea

Signed-off-by: gyou2021 <ganmei.you@intel.com>

Merge remote-tracking branch 'upstream/aice/v1.21.0' into gyou/aice/v…

522b46e

…1.21.0/qwen3

Optimized merge_multimodal_embeddings on Gaudi

5f02426

Signed-off-by: gyou2021 <ganmei.you@intel.com>

Merge remote-tracking branch 'upstream/aice/v1.21.0' into gyou/aice/v…

54b9ca3

…1.21.0/qwen3

[New Model]: Support Qwen3 Embedding & Reranker (vllm-project#19260)

abd2582

gyou2021 requested review from kzawora-intel, madamczyk-intel, michalkuligowski, mgawarkiewicz-intel, vivekgoe, afierka-intel, xuechendi, jikunshang and mswiniarsk as code owners June 19, 2025 09:56

gyou2021 changed the title ~~Support Qwen3 Embedding & Reranker on Gaudi for aice/v1.21.0 branch~~ Support Qwen3 Embedding & Reranker on Gaudi for aice/v1.21.0 branch [WIP] Jun 20, 2025

gyou2021 changed the title ~~Support Qwen3 Embedding & Reranker on Gaudi for aice/v1.21.0 branch [WIP]~~ [WIP] Support Qwen3 Embedding & Reranker on Gaudi for aice/v1.21.0 branch Jun 20, 2025

gyou2021 added 2 commits June 20, 2025 06:24

Removed assert to support hpu lazy mode of roberta models

b98ec97

Signed-off-by: gyou2021 <ganmei.you@intel.com>

Fixed the bug and supported qwen3 reranker on Gaudi.

c4b7aa2

Signed-off-by: gyou2021 <ganmei.you@intel.com>

gyou2021 changed the title ~~[WIP] Support Qwen3 Embedding & Reranker on Gaudi for aice/v1.21.0 branch~~ Support Qwen3 Embedding & Reranker on Gaudi for aice/v1.21.0 branch Jun 20, 2025

gyou2021 changed the title ~~Support Qwen3 Embedding & Reranker on Gaudi for aice/v1.21.0 branch~~ [WIP] Support Qwen3 Embedding & Reranker on Gaudi for aice/v1.21.0 branch Jun 20, 2025

Fixed the bug in the qwen3 reranker score.

610672b

Signed-off-by: gyou2021 <ganmei.you@intel.com>

gyou2021 changed the title ~~[WIP] Support Qwen3 Embedding & Reranker on Gaudi for aice/v1.21.0 branch~~ Support Qwen3 Embedding & Reranker on Gaudi for aice/v1.21.0 branch Jun 23, 2025

gyou2021 added 5 commits June 23, 2025 03:54

Transform input data to fp32 for PoolerHead.

626c905

Signed-off-by: gyou2021 <ganmei.you@intel.com>

Merge remote-tracking branch 'upstream/aice/v1.21.0' into gyou/aice/v…

ab64e2a

…1.21.0/qwen3

rebased utils.py

138df41

Signed-off-by: gyou2021 <ganmei.you@intel.com>

rebased utils.py

a9fe6a7

Signed-off-by: gyou2021 <ganmei.you@intel.com>

rebased utils.py

dfa56f6

Signed-off-by: gyou2021 <ganmei.you@intel.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support Qwen3 Embedding & Reranker on Gaudi for aice/v1.21.0 branch #1456

Support Qwen3 Embedding & Reranker on Gaudi for aice/v1.21.0 branch #1456

Uh oh!

gyou2021 commented Jun 19, 2025 •

edited by github-actions bot

Loading

Uh oh!

czhu15 commented Jun 20, 2025

Uh oh!

czhu15 commented Jun 20, 2025

Uh oh!

ranzhejiang commented Jun 20, 2025

Uh oh!

gyou2021 commented Jun 23, 2025

Uh oh!

gyou2021 commented Jun 23, 2025 •

edited

Loading

Uh oh!

czhu15 commented Jun 23, 2025

Uh oh!

gyou2021 commented Jun 23, 2025

Uh oh!

Uh oh!

Support Qwen3 Embedding & Reranker on Gaudi for aice/v1.21.0 branch #1456

Are you sure you want to change the base?

Support Qwen3 Embedding & Reranker on Gaudi for aice/v1.21.0 branch #1456

Uh oh!

Conversation

gyou2021 commented Jun 19, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

czhu15 commented Jun 20, 2025

Uh oh!

czhu15 commented Jun 20, 2025

Uh oh!

ranzhejiang commented Jun 20, 2025

Uh oh!

gyou2021 commented Jun 23, 2025

Uh oh!

gyou2021 commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

czhu15 commented Jun 23, 2025

Uh oh!

gyou2021 commented Jun 23, 2025

Uh oh!

Uh oh!

gyou2021 commented Jun 19, 2025 •

edited by github-actions bot

Loading

gyou2021 commented Jun 23, 2025 •

edited

Loading