为什么qwen3 235B和Deepseek 671B的生成速度基本没有差异? #1388
zhangjiekui
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
现在应用ktransformers、llamacpp等异构大模型推理框架,部署qwen3 235B和Deepseek 671B这两个MoE模型时,这两个模型的参数量差别很大(两者基本都是同时使用Q3或Q4量化),但为什么两者的生成速度基本没有差异?
Beta Was this translation helpful? Give feedback.
All reactions