-
-
Notifications
You must be signed in to change notification settings - Fork 8.8k
Description
Something has changed since the working commit (0.9.2.dev223+gee5ad8d2c plus my PR). I can reproduce the same gibberish on 0.9.2.dev283+ge9fd658af even without the full cudagraph compile option.
After bisecting the commits from what I have worked on for the fix to what the PR has been merged, it seems #19717 breaks MTP.
As a reproducer, what I have checked is as follows:
For the commits which merges #19717 and just before it, apply the diffs at #20022 and run the test script.
Then for f59fc60 MTP works normal and the draft acceptance rate in the log is about ~80%. For 015fab8 however, model output is a total gibberish and the draft acceptance rate is almost 0.
It seems like #19717 changes the forward pass of MTP module from FlashMLA to FusedMoE where it shouldn't. cc @bnellnm , it would be helpful if you can shed some light on what is going wrong.