-
-
Notifications
You must be signed in to change notification settings - Fork 8.9k
[Kernel] Enable fp8 support for pplx and BatchedTritonExperts. #18864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 72 commits
Commits
Show all changes
77 commits
Select commit
Hold shift + click to select a range
dd40b1e
fp8 support
bnellnm a051108
wip
bnellnm 43f9cfe
test
bnellnm 39a2ab3
basic working test
bnellnm b5996ec
tests + fix
bnellnm 356d4d7
stuff
bnellnm ad55ba1
cleanup quantization
bnellnm 347f58e
merge
bnellnm 035d324
fix merge
bnellnm ac46906
lint
bnellnm 1a5d6b3
fixes
bnellnm 29e314c
pplx + fp8 test
bnellnm cd5bc8f
fp8 + pplx tests + fixes
bnellnm 037eb4a
re-enable cudagraph+torch.compile
bnellnm 43dd36b
hacks
bnellnm a778f5a
clean up quantization parameters
bnellnm 0ec77aa
lint
bnellnm c6a4451
progress on grouped quant for batched experts
bnellnm faa9b2f
wip
bnellnm 985ce2e
triton + debug hacking
bnellnm 4d114ee
batched mm tests with real scales + grouped quant
bnellnm 8a019e2
hacking on tests
bnellnm dceee15
scale hacking
bnellnm d6eda9b
wip hacking
bnellnm f554eda
cleanup ctor args
bnellnm 8e70e60
wip
bnellnm 5f5e9a3
fixes
bnellnm 9d30bcc
refactoring
bnellnm 9fd5833
lint
bnellnm 16a4d7f
lint
bnellnm c21e4df
lint
bnellnm 82a0b1e
lint
bnellnm 39c9b5e
fix merge. split up int8/fp8 moe tests
bnellnm c036763
wip
bnellnm 861500e
cleanup
bnellnm ae39492
cleanup
bnellnm ae45963
fixes after merge
bnellnm b783ce6
torch_experts working
bnellnm 3ea1454
fp8 baselines working
bnellnm 9d8dd1d
mm baselines work
bnellnm 5b376e5
prepare_finalize wokring
bnellnm 0c06d4b
per token + grouped broken
bnellnm 8d4e287
a scales working, b scales not working
bnellnm 62404e3
blocked working
bnellnm b4dc46e
per_act_token working
bnellnm c3bddec
qwen works, rh-ds broken now, pplx_moe tests not all working
bnellnm 185f090
both models work
bnellnm 946a950
cleanup
bnellnm d8a2723
lint
bnellnm e40e9c0
fix test
bnellnm 79e8d6b
fixes
bnellnm c02faca
fix pplx tests, fix indices type assert
bnellnm 47eaa19
fixes
bnellnm 8f3ee3a
fix lint
bnellnm 1f15b73
fix per_act_token in pplx
bnellnm 7d891cd
cleanups
bnellnm ca748ed
use proper experts for test_pplx_moe, naive experts work
bnellnm 5eceb6d
fix test flag
bnellnm 9b92fee
re-enable tests + loopify test_pplx_moe tests
bnellnm 8894d0f
add optional tag to slow tests
bnellnm d135b41
fix lint
bnellnm 35966a0
tweaks
bnellnm 8485bde
fixes
bnellnm 70fa1dd
fixup world_size/dp_size params
bnellnm 9c56206
fix tests
bnellnm 653942f
more test fixes
bnellnm d2dd405
fix merge
bnellnm ae91a5e
trim testcases
bnellnm 76c697a
fix lint
bnellnm a5c8e85
ping
bnellnm 285b2bc
fix num_dispatchers for TP+DP
bnellnm 286d988
fix unit test
bnellnm 14542e5
review comments
bnellnm 2a96289
remove debug cruft
bnellnm 562bb3e
review comments + scout fix
bnellnm a9b0730
remove bogus assert
bnellnm b37026d
scout fixes
bnellnm File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this only
False
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left it here for future testing,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Should there be also a condition in the test code to skip the test if
input_scales == True
andquant_dtype is None
?Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's one of the conditions that needs more testing. There's some int8/int4 quantization schemes that happen outside the triton kernels. So they need to pass in the quantized data + scales, but no quant_type since they are already quantized.