-
Couldn't load subscription status.
- Fork 1.1k
[GPU][GEMM] Tune f16/int4 cases for xe2 #4205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
40b31d0 to
09b6ead
Compare
|
make test perf-gpu |
894a979 to
a30bff9
Compare
|
make test perf-gpu |
e5485f7 to
2737d1d
Compare
|
make test perf-gpu |
8bcde40 to
13e235e
Compare
|
make test perf-gpu |
21bde59 to
bda4505
Compare
|
Reverting commit to : [21bde59] as we see the best results with these layesr added. Overall performance for llama3.1-8b goes to 1.04 p0.Llama-3.1-8B-Instruct.FP16-4BIT.inf.fp16.pt.gs128_zp-asym_mb16 Where as Minstral shows similar performance as before (21bde59) |
bda4505 to
4b47764
Compare
Description
These strategies aim to fix regressions faced in LLMS on PTL. Analysis was done on BMG machine.
These strategies gave a good improvement for a lot of matmul layers but also introduced some regressions which are under-review.
Addresses: https://jira.devtools.intel.com/browse/MFDNN-14295