Memory bandwidth utilization #3909
Unanswered
artmoskvin
asked this question in
Q&A
Replies: 1 comment 6 replies
-
you also have to take the KV-cache into account. |
Beta Was this translation helpful? Give feedback.
6 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi all! I'm trying to understand the current memory bandwidth utilization (MBU) for llama.cpp running on M2 Max. When running 7B q4 model, I get around 60 tok/s which based on this blog post corresponds to ~50% MBU. Here's my math for the reference:
Is my math wrong? Or are there any limitations on unified memory usage?
Beta Was this translation helpful? Give feedback.
All reactions