-
Notifications
You must be signed in to change notification settings - Fork 108
Support Qwen3 Embedding & Reranker on Gaudi for aice/v1.21.0 branch #1456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: aice/v1.21.0
Are you sure you want to change the base?
Conversation
… FP8 inference. Signed-off-by: gyou2021 <ganmei.you@intel.com>
… FP8 inference. Signed-off-by: gyou2021 <ganmei.you@intel.com>
Signed-off-by: gyou2021 <ganmei.you@intel.com>
Signed-off-by: gyou2021 <ganmei.you@intel.com>
Signed-off-by: gyou2021 <ganmei.you@intel.com>
Signed-off-by: gyou2021 <ganmei.you@intel.com>
"Currently, not supported on aice/v1.21.0 yet." |
Will be good to provide the link the original PR and highlight the changes on HPU. |
I notice that you use _hpu_merge_multimodal_embeddings, but #1436 has remove this function and using another way, can you use the same approach? |
Signed-off-by: gyou2021 <ganmei.you@intel.com>
Signed-off-by: gyou2021 <ganmei.you@intel.com>
Signed-off-by: gyou2021 <ganmei.you@intel.com>
Yes. |
@gyou2021 |
Signed-off-by: gyou2021 <ganmei.you@intel.com>
Signed-off-by: gyou2021 <ganmei.you@intel.com>
Signed-off-by: gyou2021 <ganmei.you@intel.com>
Signed-off-by: gyou2021 <ganmei.you@intel.com>
Sure. Added the original PR and highlight the changes on HPU in the above PR descriptions. |
Code modifications on HPU:
example:
cd examples/offline_inference
PT_HPU_LAZY_MODE=1 VLLM_SKIP_WARMUP=true python qwen3_reranker.py
Supported on aice/v1.21.0.