-
Notifications
You must be signed in to change notification settings - Fork 106
Description
What happened?
Performance dropping with commit 1429291 #461
To identify which commit the performance dropped with I was running:
Was running for i in cut -d " " -f1 commits.txt
;do git checkout $i;./cmd-build.sh ;./start-bench.sh >> results.txt;done
start-bench.sh is:
./build/bin/llama-bench -m /mnt/nvme/models/ubergarm/DeepSeek-V3-0324-GGUF/DeepSeek-V3-0324-IQ4_K_R4/DeepSeek-V3-0324-IQ4_K_R4-00001-of-00010.gguf -p 512 -t 32 -mla 2 -fa 1 -fmoe 1 -ngl 99 --override-tensor "exps=CPU" -amb 512
Relevant results.txt:
model | size | params | backend | ngl | fa | mla | amb | fmoe | test | t/s |
---|---|---|---|---|---|---|---|---|---|---|
deepseek2 671B IQ4_K_R4 - 4.5 bpw | 386.18 GiB | 672.05 B | CUDA | 99 | 1 | 2 | 512 | 1 | pp512 | 26.74 ± 0.05 |
deepseek2 671B IQ4_K_R4 - 4.5 bpw | 386.18 GiB | 672.05 B | CUDA | 99 | 1 | 2 | 512 | 1 | tg128 | 4.80 ± 0.00 |
build: 0976467 (3715)
model | size | params | backend | ngl | fa | mla | amb | fmoe | test | t/s |
---|---|---|---|---|---|---|---|---|---|---|
deepseek2 671B IQ4_K_R4 - 4.5 bpw | 386.18 GiB | 672.05 B | CUDA | 99 | 1 | 2 | 512 | 1 | pp512 | 26.75 ± 0.04 |
deepseek2 671B IQ4_K_R4 - 4.5 bpw | 386.18 GiB | 672.05 B | CUDA | 99 | 1 | 2 | 512 | 1 | tg128 | 4.81 ± 0.00 |
build: 1429291 (3714)
model | size | params | backend | ngl | fa | mla | amb | fmoe | test | t/s |
---|---|---|---|---|---|---|---|---|---|---|
deepseek2 671B IQ4_K_R4 - 4.5 bpw | 386.18 GiB | 672.05 B | CUDA | 99 | 1 | 2 | 512 | 1 | pp512 | 76.24 ± 1.44 |
deepseek2 671B IQ4_K_R4 - 4.5 bpw | 386.18 GiB | 672.05 B | CUDA | 99 | 1 | 2 | 512 | 1 | tg128 | 10.08 ± 0.06 |
build: 24c010b (3713)
model | size | params | backend | ngl | fa | mla | amb | fmoe | test | t/s |
---|---|---|---|---|---|---|---|---|---|---|
deepseek2 671B IQ4_K_R4 - 4.5 bpw | 386.18 GiB | 672.05 B | CUDA | 99 | 1 | 2 | 512 | 1 | pp512 | 77.25 ± 0.70 |
deepseek2 671B IQ4_K_R4 - 4.5 bpw | 386.18 GiB | 672.05 B | CUDA | 99 | 1 | 2 | 512 | 1 | tg128 | 10.07 ± 0.06 |
build: c7ecd4e (3712)
Building like this:
cmake -B build -DBUILD_SHARED_LIBS=OFF -DGGML_CUDA=ON -DLLAMA_CURL=ON
cmake --build build --config Release -j --clean-first
Running on 2x9115, 768gb ram, 3090 gpu
Name and Version
version: 3710 (9fb82af)
built with cc (Debian 12.2.0-14+deb12u1) 12.2.0 for x86_64-linux-gnu
What operating system are you seeing the problem on?
Linux