Replies: 3 comments 1 reply
-
I think we already support MLA. What makes you think that we use excessive VRAM? |
Beta Was this translation helpful? Give feedback.
0 replies
-
KV Cache sizes are extremely big, as seen here: I'm running Q3_K_S (101.7 GB) with M3 Max 128 GB memory and 122 GB allocated as VRAM, it swaps with small context size (<=2k, even with 256 for some reason). 4k/8k are unusable. |
Beta Was this translation helpful? Give feedback.
1 reply
-
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
DeepSeek-V2-Chat-0628 currently uses excessive VRAM, possibly due to running as MHA instead of MLA.
Discussion here:
https://old.reddit.com/r/LocalLLaMA/comments/1e6ba6a/deepseekv2chat0628_weight_release_1_open_weight/ldtybpo/
Beta Was this translation helpful? Give feedback.
All reactions