80B/13B active MoE, good benchmarks. Seems right up ik_llama.cpp's alley, aka expert offloading like deepseek. Uses a custom architecture with good old GQA and NTK rope scaling. At a glance it doesn't look like anything too exotic: https://huggingface.co/tencent/Hunyuan-A13B-Instruct/tree/main Relevant main llama.cpp issue: https://github.com/ggml-org/llama.cpp/issues/14415