Fix gemma3 workload execution failure #1162

shepark · 2025-04-24T21:58:49Z

Fixed gemma3 workload execution failures.
Tested with gemma3 4b it model.
For your testing, you might need to use cache for model, and need to set no_proxy for client connection.

With 1.21.0
Server command:

VLLM_PROMPT_BS_BUCKET_MIN=1 VLLM_PROMPT_BS_BUCKET_STEP=1 VLLM_PROMPT_BS_BUCKET_MAX=1 \
VLLM_PROMPT_SEQ_BUCKET_MIN=384 VLLM_PROMPT_SEQ_BUCKET_MAX=384 \
VLLM_DECODE_BS_BUCKET_MIN=1 VLLM_DECODE_BS_BUCKET_MAX=1 \
VLLM_DECODE_BLOCK_BUCKET_MIN=512 VLLM_DECODE_BLOCK_BUCKET_MAX=512 \
python -m vllm.entrypoints.openai.api_server \
--model google/gemma-3-4b-it --max-num-batched-tokens 8192 --max-model-len 8192 --port 8000

With 1.20.0
There's an issue on hpu graph side, when it deals with pixel_values.
So, you need to add PT_HPUGRAPH_DISABLE_TENSOR_CACHE=false
Client command (from: https://rocm.blogs.amd.com/artificial-intelligence/deployingGemma-vllm/README.html)

for text and image input

curl -X POST "http://localhost:8000/v1/chat/completions" \
  -H "Content-Type: application/json" \
  --data '{
    "model": "google/gemma-3-4b-it",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "Describe this image in one sentence."
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
            }
          }
        ]
      }
    ]
  }' | jq

for text only

curl -X POST "http://localhost:8000/v1/chat/completions" \
  -H "Content-Type: application/json" \
  --data '{
    "model": "google/gemma-3-4b-it",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "Describe the shape of an apple."
          }
        ]
      }
    ]
  }' | jq

vllm/worker/hpu_model_runner.py

michalkuligowski · 2025-04-25T10:12:27Z

/run-gaudi-tests

michalkuligowski · 2025-04-29T12:14:46Z

/run-gaudi-tests

michalkuligowski · 2025-05-08T07:26:15Z

/run-gaudi-tests

vllm/attention/layer.py

michalkuligowski · 2025-06-04T10:27:16Z

vllm/model_executor/models/gemma3_mm.py

+        input_ids = input_ids.flatten()
+        positions = positions.flatten()


Please add hpu specific method for this and in the invocation call "current_platform.is_hpu()"

Please add hpu specific method for this and in the invocation call "current_platform.is_hpu()"

Thank you for the review. updated.

michalkuligowski · 2025-06-16T12:16:01Z

vllm/model_executor/models/gemma3_mm.py

@@ -31,6 +31,8 @@
 from vllm.multimodal.profiling import BaseDummyInputsBuilder
 from vllm.sequence import IntermediateTensors

+from vllm.platforms import current_platform


Precommit suite fails with the white space at the end of this line

shepark force-pushed the dev/shepark/fix_gemma3_failure branch from d6b0576 to 6f93ec6 Compare April 25, 2025 05:03

shepark marked this pull request as ready for review April 25, 2025 05:24

shepark requested review from kzawora-intel, madamczyk-intel, michalkuligowski, mgawarkiewicz-intel, vivekgoe and afierka-intel as code owners April 25, 2025 05:24

michalkuligowski requested changes Apr 25, 2025

View reviewed changes

vllm/worker/hpu_model_runner.py Show resolved Hide resolved

shepark force-pushed the dev/shepark/fix_gemma3_failure branch from 92f4fd0 to fedaedc Compare April 25, 2025 14:35

michalkuligowski requested review from xuechendi, jikunshang and mswiniarsk as code owners May 8, 2025 07:26

michalkuligowski requested changes May 8, 2025

View reviewed changes

vllm/attention/layer.py Outdated Show resolved Hide resolved

shepark force-pushed the dev/shepark/fix_gemma3_failure branch 3 times, most recently from 1f59955 to e92d432 Compare May 11, 2025 23:17

michalkuligowski requested changes Jun 4, 2025

View reviewed changes

shepark force-pushed the dev/shepark/fix_gemma3_failure branch 2 times, most recently from 5d2cd62 to b179744 Compare June 13, 2025 16:34

Fix gemma3 workload execution failure

0bea4f2

shepark force-pushed the dev/shepark/fix_gemma3_failure branch from b179744 to 0bea4f2 Compare June 13, 2025 16:58

michalkuligowski approved these changes Jun 16, 2025

View reviewed changes

michalkuligowski reviewed Jun 16, 2025

View reviewed changes

Update gemma3_mm.py

11fca3a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix gemma3 workload execution failure #1162

Fix gemma3 workload execution failure #1162

Uh oh!

shepark commented Apr 24, 2025 •

edited by github-actions bot

Loading

Uh oh!

Uh oh!

michalkuligowski commented Apr 25, 2025

Uh oh!

michalkuligowski commented Apr 29, 2025

Uh oh!

michalkuligowski commented May 8, 2025

Uh oh!

Uh oh!

michalkuligowski Jun 4, 2025

Uh oh!

shepark Jun 13, 2025

Uh oh!

michalkuligowski Jun 16, 2025 •

edited

Loading

Uh oh!

Uh oh!

		input_ids = input_ids.flatten()
		positions = positions.flatten()

Fix gemma3 workload execution failure #1162

Are you sure you want to change the base?

Fix gemma3 workload execution failure #1162

Uh oh!

Conversation

shepark commented Apr 24, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

michalkuligowski commented Apr 25, 2025

Uh oh!

michalkuligowski commented Apr 29, 2025

Uh oh!

michalkuligowski commented May 8, 2025

Uh oh!

Uh oh!

michalkuligowski Jun 4, 2025

Choose a reason for hiding this comment

Uh oh!

shepark Jun 13, 2025

Choose a reason for hiding this comment

Uh oh!

michalkuligowski Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

shepark commented Apr 24, 2025 •

edited by github-actions bot

Loading

michalkuligowski Jun 16, 2025 •

edited

Loading