AMX QWEN support #1356
Unanswered
voipmonitor
asked this question in
Q&A
Replies: 1 comment 3 replies
-
Hey @voipmonitor I have been struggling with this as well. I have been unable to find this information. |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
here: https://github.com/kvcache-ai/ktransformers/blob/main/doc/en/AMX.md there is "Note: At present, Qwen3MoE running with AMX can only read BF16 GGUF; support for loading from safetensor will be added later." which confuses me - does it mean, that we are not able to run 4bit quantisied versions of QWEN using AMX feature? Is it possible to use Qwen3-235B-A22B-GGUF with AMX? BF16 version is around 450GB - can anyone point me to the repo with GGUF which I can run with AMX --optimize_config_path ktransformers/optimize/optimize_rules/Qwen3Moe-serve-amx.yaml please?
edit: I have figured it out. At the moment the AMX backend can read only FP16 model variant. But the backend engine can be switched to the "AMXInt8" or "AMXBF16" in the file ktransformers/optimize/optimize_rules/Qwen3Moe-serve-amx.yaml
Beta Was this translation helpful? Give feedback.
All reactions