-
Notifications
You must be signed in to change notification settings - Fork 109
Fix gemma3 workload execution failure #1162
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: habana_main
Are you sure you want to change the base?
Conversation
d6b0576
to
6f93ec6
Compare
/run-gaudi-tests |
92f4fd0
to
fedaedc
Compare
/run-gaudi-tests |
/run-gaudi-tests |
1f59955
to
e92d432
Compare
input_ids = input_ids.flatten() | ||
positions = positions.flatten() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add hpu specific method for this and in the invocation call "current_platform.is_hpu()"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add hpu specific method for this and in the invocation call "current_platform.is_hpu()"
Thank you for the review. updated.
5d2cd62
to
b179744
Compare
b179744
to
0bea4f2
Compare
@@ -31,6 +31,8 @@ | |||
from vllm.multimodal.profiling import BaseDummyInputsBuilder | |||
from vllm.sequence import IntermediateTensors | |||
|
|||
from vllm.platforms import current_platform |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Precommit suite fails with the white space at the end of this line
Fixed gemma3 workload execution failures.
Tested with gemma3 4b it model.
For your testing, you might need to use cache for model, and need to set no_proxy for client connection.
Server command:
With 1.20.0
There's an issue on hpu graph side, when it deals with pixel_values.
So, you need to add
PT_HPUGRAPH_DISABLE_TENSOR_CACHE=false
Client command (from: https://rocm.blogs.amd.com/artificial-intelligence/deployingGemma-vllm/README.html)