Regression? Inference gone slow since April (with GPU+CPU offloading) #7886

Autonom · 2024-06-11T20:18:35Z

Autonom
Jun 11, 2024

My previous compile of llama.cpp was in April. Today, I recompiled with master (b3130) and ran with the same model (miqu-1-70b.q4_k_m.gguf) and there is a significant drop in inference speed 0.5tk/s and stalls after one or two lines of text. Before it was running at 2-4tk/s. That's ~ 4X-8X slowdown.

I reverted to an older version (b2766) and recompiled and the normal speed is back.

Is it accepted that newer features slow inference down or does this qualify as a bug?

ggerganov · 2024-06-12T11:17:17Z

ggerganov
Jun 12, 2024
Maintainer

Without repro instructions this qualifies just as user error. Provide the commands and more information about your system in order to get some support

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Regression? Inference gone slow since April (with GPU+CPU offloading) #7886

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Regression? Inference gone slow since April (with GPU+CPU offloading) #7886

Uh oh!

Uh oh!

Autonom Jun 11, 2024

Replies: 1 comment

Uh oh!

ggerganov Jun 12, 2024 Maintainer

Autonom
Jun 11, 2024

ggerganov
Jun 12, 2024
Maintainer