Replies: 1 comment
-
|
Hi @tjruwase , any comments for this idea? We are thinking about combine op builder interface with Triton kernels so we can open the possibilities for write non-triton kernels for sparse attention, while keep CUDA path on Triton. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, I noticed that SparseAttention is implemented with Triton for CUDA execution. When we try to implement SparseAttention on other accelerator, we found Triton might be a blocker because Triton are not available on every other accelerator. In that case, implement SparseAttention OpBuilder and kernels would be a natural option.
I'm wondering whether DeepSpeed can allow Triton implement to coexist with OpBuilder implementation in DeepSpeed to enhance extendability. The idea is to implement a special PythonBuilder class that allows the OpBuilder loaded module to call python function, inside of python function we can call python function or Triton implementation. A demotration of the concept can be found in the following link.
https://github.com/delock/DeepSpeedSYCLSupport/blob/gma/kernel-python-study/op_builder/cpu/transformer_inference.py
With the OpBuilder introduced back, DeepSpeed would have the flexibility to implement a function with either Triton or accelerator native code, with a unify interface. This would enhance extendability on acclerator where Triton had not been implemented yet.
Beta Was this translation helpful? Give feedback.
All reactions