Q4_0 repacking suggestion #239
bartowski1182
started this conversation in
General
Replies: 1 comment 4 replies
-
hey, appreciate pointing this out! Yep, mmap is set to true by default. Since repacking feature is set at compile time, it will always be available. but, I assume this only affects Q4_0 and IQ4_NL? So if I only disable mmap for Q4_0 and IQ4_NL ones, that should prevent the issue, right? I'll test it with a couple of settings to see the impact as well. |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Mentioned this over on Reddit awhile back, about enabling repacking for Q4_0 and IQ4_NL
Just wanted to let you know that apparently if you use mmap (a common feature to speed up model loading) it'll actually some the the un-repacked weights in memory, increasing the "on paper" memory consumption by like 50%
In practice this memory SHOULD get cleared out by the system if it's needed, but I have a feeling it sometimes may not. So you may want to check when using a repackable quant type that you disable the mmap to avoid excess RAM usage
Beta Was this translation helpful? Give feedback.
All reactions