[GPU][GEMM] Tune f16/int4 cases for xe2 #4205

h-sadia · 2025-10-24T19:03:07Z

Description

These strategies aim to fix regressions faced in LLMS on PTL. Analysis was done on BMG machine.

These strategies gave a good improvement for a lot of matmul layers but also introduced some regressions which are under-review.

Addresses: https://jira.devtools.intel.com/browse/MFDNN-14295

h-sadia · 2025-10-24T19:20:04Z

make test perf-gpu
set primitive=gpu:gemm

h-sadia · 2025-10-27T03:14:07Z

make test perf-gpu
set primitive=gpu:gemm

h-sadia · 2025-10-27T05:54:38Z

make test perf-gpu
set primitive=gpu:gemm

h-sadia · 2025-10-27T08:31:16Z

make test perf-gpu
set primitive=gpu:gemm

h-sadia · 2025-10-27T15:01:32Z

Reverting commit to : [21bde59] as we see the best results with these layesr added. Overall performance for llama3.1-8b goes to 1.04

p0.Llama-3.1-8B-Instruct.FP16-4BIT.inf.fp16.pt.gs128_zp-asym_mb16
Time (ms) @ 266731 commit: 2796.210304
Time (ms) @ 21bde59 commit: 2688.734112
SpeedUp: 1.04

Where as Minstral shows similar performance as before

(21bde59)
Perf Results: https://ecmd.jf.intel.com/commander/link/workspaceFile/workspaces/JF-COMMON-CI?jobStepId=f0b128eb-92cd-f151-90e9-d4f5ef20c6a0&fileName=xe2-hpg-bmg-compare.f0b128eb-92cd-f151-90e9-d4f5ef20c6a0.log&jobName=hsadia_PCm_Performance_GPU_oneDNN_2025-10-24-22%3A08%3A05-hsadia_ptl_regs-21bde59&jobId=f0b125ea-7e57-f125-9e9c-d4f5ef20c6a0&diagCharEncoding=&resourceName=sdpdnn644308&completed=1

h-sadia requested a review from a team as a code owner October 24, 2025 19:03

github-actions bot added the platform:gpu-intel Codeowner: @oneapi-src/onednn-gpu-intel label Oct 24, 2025

hidefromkgb approved these changes Oct 24, 2025

View reviewed changes

h-sadia force-pushed the hsadia/ptl_regs branch from 40b31d0 to 09b6ead Compare October 24, 2025 19:16

h-sadia changed the title ~~xe2: add gemm strategies for FHS~~ [GPU][GEMM] Tune f16/int4 cases for xe2 Oct 24, 2025

h-sadia mentioned this pull request Oct 24, 2025

[Backport] [GPU][GEMM] Tune f16/int4 cases for xe2 #4207

Open

h-sadia mentioned this pull request Oct 24, 2025

[Backport] [Backport] [GPU][GEMM] Tune f16/int4 cases for xe2 #4208

Open

kealan-barbieri approved these changes Oct 24, 2025

View reviewed changes

h-sadia force-pushed the hsadia/ptl_regs branch 7 times, most recently from 894a979 to a30bff9 Compare October 25, 2025 07:00

echeresh force-pushed the hsadia/ptl_regs branch 3 times, most recently from e5485f7 to 2737d1d Compare October 27, 2025 05:53

h-sadia force-pushed the hsadia/ptl_regs branch 6 times, most recently from 8bcde40 to 13e235e Compare October 27, 2025 08:30

h-sadia force-pushed the hsadia/ptl_regs branch 2 times, most recently from 21bde59 to bda4505 Compare October 27, 2025 14:45

gpu: intel: gemm: jit: tune gemm for f16,int4 for xe2/xe3

4b47764

h-sadia force-pushed the hsadia/ptl_regs branch from bda4505 to 4b47764 Compare October 28, 2025 18:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[GPU][GEMM] Tune f16/int4 cases for xe2 #4205

[GPU][GEMM] Tune f16/int4 cases for xe2 #4205

Uh oh!

h-sadia commented Oct 24, 2025

Uh oh!

h-sadia commented Oct 24, 2025

Uh oh!

h-sadia commented Oct 27, 2025

Uh oh!

h-sadia commented Oct 27, 2025

Uh oh!

h-sadia commented Oct 27, 2025

Uh oh!

h-sadia commented Oct 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

[GPU][GEMM] Tune f16/int4 cases for xe2 #4205

Are you sure you want to change the base?

[GPU][GEMM] Tune f16/int4 cases for xe2 #4205

Uh oh!

Conversation

h-sadia commented Oct 24, 2025

Description

Uh oh!

h-sadia commented Oct 24, 2025

Uh oh!

h-sadia commented Oct 27, 2025

Uh oh!

h-sadia commented Oct 27, 2025

Uh oh!

h-sadia commented Oct 27, 2025

Uh oh!

h-sadia commented Oct 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants