You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have seen this post: https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9 @Artefact2 posted a chart there which benchmarks each quantization on Mistral-7B, however I would be interested in the same chart for a bigger model, 70B in specific. Somebody else asked this in a comment on the gist, however Artefact2 is incapable of performing the same test on a bigger model due to hardware constraints.
Even though Artefact2 expects these charts to look similar I'm still interested in them, because in my experience running a Q2 of a 70B/120B is a much smoother experience than running Mistral at Q2. While Q2 on a 30B (and partially also 70B) model breaks large parts of the model, the bigger models still seem to retain most of their quality. I assume this is because more information is preserved, like how parameters carry an information and having more parameters means carrying more information.
I haven't seen anyone else post a chart, so if there is one or if somebody could make one I'd greatly appreciate it as my 2060 6GB probably already suffers from PTSD after running 70B+ on it (though the CPU did most of the heavy lifting). Either way I think about buying something better and it would be nice to know if the bigger models also require Q5 or higher to reach performance comparable to f16 or if Q3 is enough for these bigger models to function.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
I have seen this post: https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9
@Artefact2 posted a chart there which benchmarks each quantization on Mistral-7B, however I would be interested in the same chart for a bigger model, 70B in specific. Somebody else asked this in a comment on the gist, however Artefact2 is incapable of performing the same test on a bigger model due to hardware constraints.
Even though Artefact2 expects these charts to look similar I'm still interested in them, because in my experience running a Q2 of a 70B/120B is a much smoother experience than running Mistral at Q2. While Q2 on a 30B (and partially also 70B) model breaks large parts of the model, the bigger models still seem to retain most of their quality. I assume this is because more information is preserved, like how parameters carry an information and having more parameters means carrying more information.
I haven't seen anyone else post a chart, so if there is one or if somebody could make one I'd greatly appreciate it as my 2060 6GB probably already suffers from PTSD after running 70B+ on it (though the CPU did most of the heavy lifting). Either way I think about buying something better and it would be nice to know if the bigger models also require Q5 or higher to reach performance comparable to f16 or if Q3 is enough for these bigger models to function.
Thanks in advance ^^
Beta Was this translation helpful? Give feedback.
All reactions