-
Notifications
You must be signed in to change notification settings - Fork 698
supports HPU double quant #1630
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
supports HPU double quant #1630
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Hi @matthewdouglas, just checking in the status - do we have an expected timeline for merging this PR. Also, could you please let me know if there are any plans to merge it soon |
Hi @rsshaik1, I'll go ahead and merge this here. However, please see #1596 for an update on our plans with this branch. In short, we're going to stop development on the |
c3eac42
into
bitsandbytes-foundation:multi-backend-refactor
@matthewdouglas Thanks for your support in accepting Intel Gaudi (HPU) related PRs. I have a few questions regarding plans for additional devices support on "main",
|
We're still working on timelines for Intel hardware support with @kding1, so it may be best to reach out to him directly to align on goals related to that. Otherwise what I can say is we've stopped working on the We're going to keep the multi-backend-refactor wheels available for a while as there's likely users depending on them, but in #1644 I'm now updating the documentation to indicate that it's being deprecated. In the next few days we will be pushing a v0.46.0 release, and then start merging new device support PRs on From what I can tell there's actually significant overlap with the work that @jiqing-feng is doing in #1628 as many of the op implementations for HPU appear to be implemented simply in PyTorch, the same as CPU/XPU. Many of those plain PyTorch ops are being registered now to the "default" dispatch key, meaning they would be used as the implementation for any device that does not override with its own implementation. I think the main change needed would be to register the ops for HPU that wrap around We're coordinating on Intel development in a Slack channel also; if you feel that's appropriate we could invite Habana stakeholders there. |
@matthewdouglas Thanks for providing detailed information. You are right, for HPU we are ok with "default" for most ops, only op we need to register separately for HPU is dequantize_4bit. We will work on creating a PR to bring HPU changes to "main" branch. |
@vivekgoe invite to Slack channel just went out |
This PR integrates the support for double dequantization on Gaudi (HPU).