Skip to content

Commit 2e6b523

Browse files
ikawrakowIwan Kawrakow
andauthored
Faster Q4_K_R4 and Q5_K_R4 on AVX2/Zen4 (#182)
* Slightly faster AVX2 implementation for q4_k_r4 * Even better AVX2 implementation for q4_k_r4 We now arrive at PP-512 = 328 t/s for LLaMA-3.1-8B on a Ryzen-5975WX CPU, up from 291 t/s when I last measured on 3c5f872. With FA and Q8_0 K-cache we get to 339.5 t/s. * Fix llama-bench labels that I broke with #181 * Faster AVX2 implementation for q5_k_q4 We arrive at 302 t/s for LLaMA-3.1-8B on a Ryzen-5975WX CPU, up from 273 t/s. * Use AVX2 implementation of q4_k_r4 and q5_k_r4 also on Zen4 After the changes I made to AVX2, it ends up being slightly faster compared to what I had for Zen4. * Minor tweak * Cleanup --------- Co-authored-by: Iwan Kawrakow <iwan.kawrakow@gmail.com>
1 parent 4a73c25 commit 2e6b523

File tree

2 files changed

+68
-255
lines changed

2 files changed

+68
-255
lines changed

examples/llama-bench/llama-bench.cpp

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -756,7 +756,7 @@ static std::vector<cmd_params_instance> get_cmd_params_instances(const cmd_param
756756
continue;
757757
}
758758
cmd_params_instance instance = {
759-
/* .test_kind = */ TEST_KIND_PP,
759+
/* .test_kind = */ TEST_KIND_TG,
760760
/* .model = */ m,
761761
/* .n_prompt = */ 0,
762762
/* .n_gen = */ n_gen,
@@ -784,7 +784,7 @@ static std::vector<cmd_params_instance> get_cmd_params_instances(const cmd_param
784784
continue;
785785
}
786786
cmd_params_instance instance = {
787-
/* .test_kind = */ TEST_KIND_PP,
787+
/* .test_kind = */ TEST_KIND_PG,
788788
/* .model = */ m,
789789
/* .n_prompt = */ n_pg.first,
790790
/* .n_gen = */ n_pg.second,

0 commit comments

Comments
 (0)