Skip to content

Conversation

@h-sadia
Copy link
Contributor

@h-sadia h-sadia commented Oct 24, 2025

Description

These strategies aim to fix regressions faced in LLMS on PTL. Analysis was done on BMG machine.

These strategies gave a good improvement for a lot of matmul layers but also introduced some regressions which are under-review.

Addresses: https://jira.devtools.intel.com/browse/MFDNN-14295

@h-sadia h-sadia requested a review from a team as a code owner October 24, 2025 19:03
@github-actions github-actions bot added the platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel label Oct 24, 2025
@h-sadia h-sadia changed the title xe2: add gemm strategies for FHS [GPU][GEMM] Tune f16/int4 cases for xe2 Oct 24, 2025
@h-sadia
Copy link
Contributor Author

h-sadia commented Oct 24, 2025

make test perf-gpu
set primitive=gpu:gemm

@h-sadia h-sadia force-pushed the hsadia/ptl_regs branch 7 times, most recently from 894a979 to a30bff9 Compare October 25, 2025 07:00
@h-sadia
Copy link
Contributor Author

h-sadia commented Oct 27, 2025

make test perf-gpu
set primitive=gpu:gemm

@echeresh echeresh force-pushed the hsadia/ptl_regs branch 3 times, most recently from e5485f7 to 2737d1d Compare October 27, 2025 05:53
@h-sadia
Copy link
Contributor Author

h-sadia commented Oct 27, 2025

make test perf-gpu
set primitive=gpu:gemm

@h-sadia h-sadia force-pushed the hsadia/ptl_regs branch 6 times, most recently from 8bcde40 to 13e235e Compare October 27, 2025 08:30
@h-sadia
Copy link
Contributor Author

h-sadia commented Oct 27, 2025

make test perf-gpu
set primitive=gpu:gemm

@h-sadia h-sadia force-pushed the hsadia/ptl_regs branch 2 times, most recently from 21bde59 to bda4505 Compare October 27, 2025 14:45
@h-sadia
Copy link
Contributor Author

h-sadia commented Oct 27, 2025

Reverting commit to : [21bde59] as we see the best results with these layesr added. Overall performance for llama3.1-8b goes to 1.04

p0.Llama-3.1-8B-Instruct.FP16-4BIT.inf.fp16.pt.gs128_zp-asym_mb16
Time (ms) @ 266731 commit: 2796.210304
Time (ms) @ 21bde59 commit: 2688.734112
SpeedUp: 1.04

Where as Minstral shows similar performance as before

(21bde59)
Perf Results: https://ecmd.jf.intel.com/commander/link/workspaceFile/workspaces/JF-COMMON-CI?jobStepId=f0b128eb-92cd-f151-90e9-d4f5ef20c6a0&fileName=xe2-hpg-bmg-compare.f0b128eb-92cd-f151-90e9-d4f5ef20c6a0.log&jobName=hsadia_PCm_Performance_GPU_oneDNN_2025-10-24-22%3A08%3A05-hsadia_ptl_regs-21bde59&jobId=f0b125ea-7e57-f125-9e9c-d4f5ef20c6a0&diagCharEncoding=&resourceName=sdpdnn644308&completed=1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants