Maybe an interesting CUDA PR here. #564

Nexesenex · 2025-06-29T23:48:31Z

Nexesenex
Jun 29, 2025

Title : Overlap CUDA graph building and processing to minimize GPU idle time and improve tokens per seconds performance.
#11867
Link : ggml-org/llama.cpp#11867
Author : @aendk
Use : a few % boost on Cuda PP and TG?

ikawrakow · 2025-07-01T13:56:23Z

ikawrakow
Jul 1, 2025
Maintainer

Yes, I saw this PR. But to quote Diego's statement in the PR discussion

I still think that this change adds a significant amount of complexity, to code that is already too fragile and complex to reasonably maintain.

I fully agree with that. The back-end is really fragile, so performance gains must be way more than 2-3% to warrant a change such as that one.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Maybe an interesting CUDA PR here. #564

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Maybe an interesting CUDA PR here. #564

Uh oh!

Nexesenex Jun 29, 2025

Replies: 1 comment

Uh oh!

ikawrakow Jul 1, 2025 Maintainer

Nexesenex
Jun 29, 2025

ikawrakow
Jul 1, 2025
Maintainer