Skip to content

Port: temporarely disable deepseek test #1535 #1586

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 14, 2025

Conversation

iboiko-habana
Copy link

Port: Update hpu-ext sha and temporarely disable deepseek test #1535

@madamczyk-intel
Copy link

/run-gaudi-tests

@iboiko-habana iboiko-habana changed the title Port: Update hpu-ext sha and temporarely disable deepseek test #1535 Port: temporarely disable deepseek test #1535 Jul 14, 2025
@michalkuligowski
Copy link

/skip-gaudi-tests

@iboiko-habana iboiko-habana enabled auto-merge (squash) July 14, 2025 13:17
@iboiko-habana iboiko-habana disabled auto-merge July 14, 2025 13:17
@michalkuligowski michalkuligowski merged commit 47768d3 into v1.22.0_next Jul 14, 2025
6 checks passed
@michalkuligowski michalkuligowski deleted the dev/iboiko/disabledeepseektest branch July 14, 2025 13:18
tianyuan211 added a commit to tianyuan211/vllm-fork that referenced this pull request Aug 7, 2025
commit 0884eb4
Author: Jimin Ha <jimin.ha@intel.com>
Date:   Fri Aug 1 05:42:09 2025 -0700

    Gemma3 v1.22  changes (Sliding_Window feature  + few others) (HabanaAI#1660)

    This PR contains following changes
    1. Port Gemma3 SLIDING_WINDOW FusedSDPA feature from habana_main + Add a
    few extra fixes including..
    - Sliding FusedSDPA kernel, we are adding threshold variable to enable
    or disable to use optimized kernel. This kernel will be
    performance/memory benefit for longer sequence. We are providing
    environment variable to control per customer request.
    - Based on the threshold, choose different prompt bucket, if it's
    smaller than the threshold, use PROMPT_BUCKET_STEP, otherwise use
    SLICE_SIZE.
     - Added mark_step before SLIDING FusedSDPA is run.
     - Misc fixes for bucket related issue.
     2. upstream fixes
     vllm-project#18732
    vllm-project#21479
    vllm-project#19788

    3. optimized Gemma3RMSNorm with FusedRMSNorm
    Dependent on HabanaAI#1647

    Run command with.
    VLLM_FUSEDSDPA_SLIDE_THLD=2048 VLLM_EXPONENTIAL_BUCKETING=false
    VLLM_PROMPT_BS_BUCKET_MAX=64 VLLM_PROMPT_SEQ_BUCKET_STEP=1024
    VLLM_PROMPT_SEQ_BUCKET_MAX=20480 PT_HPU_SDPA_QKV_SLICE_MODE_FWD=1

    ---------

    Signed-off-by: Lukas Geiger <lukas.geiger94@gmail.com>
    Signed-off-by: Hongmin Fan <fanhongmin@google.com>
    Co-authored-by: Henry Tang <ytang@habana.ai>
    Co-authored-by: Mohit Deopujari <mdeopujari@habana.ai>
    Co-authored-by: Shiv Kaul <skaul@habana.ai>
    Co-authored-by: Shiv Kaul <shiv.kaul@intel.com>
    Co-authored-by: Libin Tang <libin.tang@intel.com>
    Co-authored-by: Lukas Geiger <lukas.geiger94@gmail.com>
    Co-authored-by: Hongmin Fan <fanhongmin@google.com>
    Co-authored-by: Harish Subramony <hsubramony@habana.ai>
    Co-authored-by: Jianhong-Zhang <jianhong.zhang@intel.com>
    Co-authored-by: Libin Tang <litang@habana.ai>
    Co-authored-by: Michał Kuligowski <mkuligowski@habana.ai>

commit 065fde3
Author: Jan Kaniecki <jan.kaniecki@intel.com>
Date:   Thu Jul 31 15:42:13 2025 +0200

    Remove inference_mode() from platforms.hpu (HabanaAI#1690)

    Inference_mode() is causing recompilations with t.compile - we don't
    need it as we already put inference_mode on particular functions in
    model runner. It was introduced by Rebase 0.9.0.1
    (HabanaAI#1507) - previously we didn't
    have such call.

commit 7d6528e
Author: Krzysztof Smusz <ksmusz@habana.ai>
Date:   Wed Jul 30 12:19:34 2025 +0200

    Set hpu-extension to 61dafb3 (HabanaAI#1683)

    Upgrading vllm-hpu-extension with change introducing the fix for
    unsupported block_softmax_adjustment in fp16 precision

commit ff9bff9
Author: Iryna Boiko <iboiko@habana.ai>
Date:   Tue Jul 29 09:19:29 2025 +0200

    Remove dtype.float16 support for hpu config (HabanaAI#1650)

commit 034c756
Author: Chendi.Xue <chendi.xue@intel.com>
Date:   Tue Jul 29 02:17:44 2025 -0500

    [SW-234344] Fix 'RotaryEmbedding' object has no attribute 'sin' (HabanaAI#1659)

    ## Essential Elements of an Effective PR Description Checklist
    - [x] The purpose of the PR, such as "Fix some issue (link existing
    issues this PR will resolve)".
    - [ ] The test plan, such as providing test command.
    - [ ] The test results, such as pasting the results comparison before
    and after, or e2e results

    ## Purpose

    port commit from HabanaAI#1658 for fixing SW-234344 for habana_main

    ## Test Plan

    ## Test Result

    <!--- pyml disable-next-line no-emphasis-as-heading -->

    Signed-off-by: Chendi.Xue <chendi.xue@intel.com>

commit e5a6120
Author: Agata Dobrzyniewicz <160237065+adobrzyn@users.noreply.github.com>
Date:   Tue Jul 29 08:53:48 2025 +0200

    1.22 Warmup one context more - linear - Update sha extension (HabanaAI#1655)

    Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>
    Co-authored-by: Jan Kaniecki <jan.kaniecki@intel.com>

commit 9957ca7
Author: Michał Kuligowski <mkuligowski@habana.ai>
Date:   Tue Jul 29 08:52:48 2025 +0200

    ValueError: 'aimv2' is already used by a Transformers config, pick an… (HabanaAI#1673)

    Fix cherrypicked from upstream
    https://github.com/vllm-project/vllm/pull/20921/files

commit f1b60b4
Author: Mohit Deopujari <mdeopujari@habana.ai>
Date:   Thu Jul 24 08:07:04 2025 -0700

    Gemma3 suppport: propogation : pr1589/1597/1558 to v1.22.0_next (HabanaAI#1616)

    Added support for FusedSDPA kernel with window_size for Gemma3.
    This PR relies on vllm-hpu-extension
    [PR302](HabanaAI/vllm-hpu-extension#302)

    ---------

    Co-authored-by: Shiv Kaul <skaul@habana.ai>
    Co-authored-by: Shiv Kaul <shiv.kaul@intel.com>
    Co-authored-by: Jimin Ha <jimin.ha@intel.com>
    Co-authored-by: Henry Tang <ytang@habana.ai>
    Co-authored-by: Libin Tang <litang@habana.ai>
    Co-authored-by: Libin Tang <libin.tang@intel.com>
    Co-authored-by: Michał Kuligowski <mkuligowski@habana.ai>

commit 59b8f75
Author: Artur Fierka <artur.fierka@intel.com>
Date:   Thu Jul 24 13:11:57 2025 +0200

    Update hpu.txt on 1.22.0 branch (HabanaAI#1648)

    Set extension SHA for Port: Fix: Round up to sliding window threshold
    HabanaAI#307 (HabanaAI#309)

commit d6b00f4
Author: Artur Fierka <artur.fierka@intel.com>
Date:   Wed Jul 23 15:50:14 2025 +0200

    [Security] Fix: Bad use of null-like value (HabanaAI#1634)

    Signed-off-by: Artur Fierka <artur.fierka@intel.com>

commit 66858d6
Author: Artur Fierka <artur.fierka@intel.com>
Date:   Wed Jul 23 15:48:53 2025 +0200

    [Security] Fix: Structurally dead code (HabanaAI#1625)

    Remove dead code for security reason

    Signed-off-by: Artur Fierka <artur.fierka@intel.com>

commit 33fbed4
Author: Agata Dobrzyniewicz <160237065+adobrzyn@users.noreply.github.com>
Date:   Tue Jul 22 12:49:42 2025 +0200

    Update sha - Port: Fix fallback bucket (HabanaAI#1626)

    Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>

commit 1b46f4c
Author: Seunghyuk Park (shepark) <seunghyuk.h.park@intel.com>
Date:   Tue Jul 22 00:52:50 2025 -0700

    Embedding fix: warmup failure in embedding model (HabanaAI#1510) (HabanaAI#1559)

    Merge changes from habana_main for embedding fix
    HabanaAI#1510

    ---- details ----
    Fix the failures at warmup stage in pooling mode

    --
    due to.
    [rank0]: File "/wm/vllm-fork/vllm/worker/hpu_model_runner.py", line
    2904, in warmup_model
    [rank0]: self.warmup_graphs(
    [rank0]: File "/wm/vllm-fork/vllm/worker/hpu_model_runner.py", line
    2714, in warmup_graphs
    [rank0]: self.warmup_scenario(batch_size,
    [rank0]: File "/wm/vllm-fork/vllm/worker/hpu_model_runner.py", line
    2561, in warmup_scenario
    [rank0]: inputs = self.prepare_model_input_align_worker( [rank0]: File
    "/wm/vllm-fork/vllm/worker/model_runner_base.py", line 233, in
    prepare_model_input_align_worker
    [rank0]: raise NotImplementedError
    [rank0]: NotImplementedError

    Co-authored-by: Libin Tang <litang@habana.ai>
    Co-authored-by: Michał Kuligowski <mkuligowski@habana.ai>

commit 062f345
Author: Karol Damaszke <kdamaszke@habana.ai>
Date:   Fri Jul 18 17:02:42 2025 +0200

    Fix text-only prompt in Llama Vision (HabanaAI#1621)

    Fixes text-only prompts in Llama Vision. Without setting
    `max_encoder_seq_lens` we are not skipping `cross_attention` for
    text-only prompts, which results in None's `key` and `value`.

    Signed-off-by: Karol Damaszke <kdamaszke@habana.ai>

commit 449fa92
Author: Tomasz Thaddey <76682475+tthaddey@users.noreply.github.com>
Date:   Thu Jul 17 15:44:56 2025 +0200

    docker vllm: update readme (HabanaAI#1596)

    docker vllm: update readme

    Signed-off-by: Tomasz Thaddey <tthaddey@habana.ai>

commit 22ee396
Author: Michal Adamczyk <michal.adamczyk@intel.com>
Date:   Thu Jul 17 09:44:10 2025 +0200

    [1.22] Set vllm-hpu-extension to 22abb7a (HabanaAI#1611)

commit 37888b5
Author: Agata Dobrzyniewicz <160237065+adobrzyn@users.noreply.github.com>
Date:   Thu Jul 17 07:11:00 2025 +0200

    Port: V1 - dont look for bucket we know don't exists (HabanaAI#1606) (HabanaAI#1608)

    Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>

commit 18d51d1
Author: Agata Dobrzyniewicz <160237065+adobrzyn@users.noreply.github.com>
Date:   Wed Jul 16 16:29:47 2025 +0200

    Readme update - Dont use apc on v0 (HabanaAI#1607)

    Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>

commit 9b1675c
Author: Agata Dobrzyniewicz <160237065+adobrzyn@users.noreply.github.com>
Date:   Wed Jul 16 13:43:59 2025 +0200

    Port: Num blocks fix - V1 (HabanaAI#1594) (HabanaAI#1601)

    Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>

commit bdd9171
Author: Yi Liu <yi4.liu@intel.com>
Date:   Tue Jul 15 18:43:49 2025 +0800

    Update Force Channel FP8 Check (HabanaAI#1563)

    Porting HabanaAI#1561

    Signed-off-by: yiliu30 <yi4.liu@intel.com>

commit 23e63c0
Author: liuzhenwei <zhenwei.liu@intel.com>
Date:   Tue Jul 15 16:06:19 2025 +0800

    [V0] Use device as the set_device's parameter by default, update proxy (HabanaAI#1582)

    https://jira.habana-labs.com/browse/SW-234257
    cherry-pick from HabanaAI#1540

    Signed-off-by: zhenwei <zhenweiliu@habana.ai>
    Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>

commit 82fc060
Author: Iryna Boiko <iboiko@habana.ai>
Date:   Mon Jul 14 15:58:24 2025 +0200

    Change vllm-hpu-extension revision to 89515f6 (HabanaAI#1584)

    Change vllm-hpu-extension revision to 89515f6

commit 47768d3
Author: Iryna Boiko <iboiko@habana.ai>
Date:   Mon Jul 14 15:18:30 2025 +0200

    Port: temporarely disable deepseek test HabanaAI#1535 (HabanaAI#1586)

    Port: Update hpu-ext sha and temporarely disable deepseek test HabanaAI#1535

commit f1c70dc
Author: Michał Kuligowski <mkuligowski@habana.ai>
Date:   Mon Jul 14 14:57:57 2025 +0200

    Fix AttributeError: 'NoneType' object has no attribute 'getenv' (HabanaAI#1555)

    Fixes
    AttributeError: 'NoneType' object has no attribute 'getenv'
    during tests teardown

commit 617498a
Author: Agata Dobrzyniewicz <160237065+adobrzyn@users.noreply.github.com>
Date:   Mon Jul 14 14:35:07 2025 +0200

    Readme warmup update (HabanaAI#1512) (HabanaAI#1585)

    Signed-off-by: Agata Dobrzyniewicz <adobrzyniewicz@habana.ai>

commit 8bb429d
Author: Tomasz Pawlowski <tpawlowski@habana.ai>
Date:   Fri Jul 11 20:21:57 2025 +0200

    Add accelerate to requirements/hpu.txt (HabanaAI#1564) (v1.22.0) (HabanaAI#1566)

    Cherry picked from HabanaAI#1564

    Co-authored-by: Karol Damaszke <kdamaszke@habana.ai>

commit aca2ddc
Author: Tomasz Thaddey <76682475+tthaddey@users.noreply.github.com>
Date:   Fri Jul 11 12:58:11 2025 +0200

    docker vllm: add server config for model Qwen/Qwen2.5-VL-7B-Instruct (HabanaAI#1569)

    docker vllm: add server config for model Qwen/Qwen2.5-VL-7B-Instruct

    ---------

    Signed-off-by: Tomasz Thaddey <tthaddey@habana.ai>

commit 512caed
Author: Tomasz Thaddey <76682475+tthaddey@users.noreply.github.com>
Date:   Thu Jul 10 08:12:39 2025 +0200

    docker vllm: cleanup configs and add missing models (HabanaAI#1548)

    docker vllm: cleanup configs and add missing models

    ---------

    Signed-off-by: Tomasz Thaddey <tthaddey@habana.ai>

commit 7b69f70
Author: PatW <patryk.wolsza@intel.com>
Date:   Tue Jul 8 13:56:23 2025 +0200

    Cherrypick docker vllm: update readme (HabanaAI#1525) (HabanaAI#1538)

    Cherry pick of the docker vllm: update readme from habana_main

    Signed-off-by: Tomasz Thaddey <tthaddey@habana.ai>
    Signed-off-by: Artur Fierka <artur.fierka@intel.com>
    Co-authored-by: Tomasz Thaddey <76682475+tthaddey@users.noreply.github.com>

commit 79ef0d5
Author: Michal Szutenberg <michal.szutenberg@intel.com>
Date:   Tue Jul 8 12:39:00 2025 +0200

    [SW-234006] Fix requirements (1.22.0) (HabanaAI#1530)

    See
    https://jira.habana-labs.com/browse/SW-234006?focusedId=1073396&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-1073396
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants