Possible prompt processing speedup for current MoE models like DeepSeek V3 and Llama 4? #12932

Dampfinchen · 2025-04-13T16:58:20Z

Dampfinchen
Apr 13, 2025

Hello,

back in the Mixtral days, I remember a PR by @slaren which enhanced prompt processing performance significantly for the MoE mixtral models. I believe what it did was grouping all the experts together.

#6505

This PR, I think it was.

Is there a possibility of adding this low hanging fruit to the recent MoE models like DeepSeek and L4 as well or is there any architectural limitation that prevents one from doing so?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Possible prompt processing speedup for current MoE models like DeepSeek V3 and Llama 4? #12932

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Possible prompt processing speedup for current MoE models like DeepSeek V3 and Llama 4? #12932

Uh oh!

Dampfinchen Apr 13, 2025

Replies: 0 comments

Dampfinchen
Apr 13, 2025