-
Notifications
You must be signed in to change notification settings - Fork 845
Handle PrmtSlow #518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle PrmtSlow #518
Conversation
Uses the HIP implementation of `__byte_perm`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements support for the PTX prmt.slow
instruction by utilizing the HIP implementation of __byte_perm
. The change adds proper handling for the PrmtSlow instruction that was previously unimplemented.
- Adds a C++ implementation using HIP's
__byte_perm
function - Routes PrmtSlow instructions through the function call mechanism instead of direct LLVM emission
- Includes test coverage for the new functionality
Reviewed Changes
Copilot reviewed 6 out of 7 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
ptx/src/test/spirv_run/prmt_slow.ptx | Test case PTX assembly for prmt.slow instruction |
ptx/src/test/spirv_run/mod.rs | Test registration with input/output validation |
ptx/src/test/ll/prmt_slow.ll | Expected LLVM IR output for the test |
ptx/src/pass/replace_instructions_with_functions.rs | Routes PrmtSlow to function call mechanism |
ptx/src/pass/llvm/emit.rs | Removes direct emission handling, marks as unreachable |
ptx/lib/zluda_ptx_impl.cpp | Implements prmt_b32 function using HIP's __byte_perm |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Did you check with |
Seems like |
Passes the prmt default mode ptx_tests (other modes are still unimplemented).
With the latest change this now passes 100% of the The assembly looks like:
This is a lot, but the semantics of I believe LLVM should be able to constant fold this to a single |
I've verified that identical assembly is generated by |
Looks ok, I don't think there's any usage of non-constant selectors in the wild (though it's not always passed as an immediate). Question, what happens if there's a sign extension (in constant selector)? Do we get folder into a single |
I've added a change that improves constant folding of a constant selector such that it will compile to a single |
Uses the HIP implementation of
__byte_perm
.