Skip to content

supports HPU double quant #1630

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

rsshaik1
Copy link

@rsshaik1 rsshaik1 commented May 9, 2025

This PR integrates the support for double dequantization on Gaudi (HPU).

@rsshaik1 rsshaik1 marked this pull request as ready for review May 9, 2025 05:32
Copy link

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@rsshaik1
Copy link
Author

Hi @matthewdouglas, just checking in the status - do we have an expected timeline for merging this PR. Also, could you please let me know if there are any plans to merge it soon

@matthewdouglas
Copy link
Member

Hi @rsshaik1,

I'll go ahead and merge this here. However, please see #1596 for an update on our plans with this branch.

In short, we're going to stop development on the multi-backend-refactor branch and move towards implementing additional devices on main using a newer dispatch interface based on custom operators in PyTorch.

@matthewdouglas matthewdouglas merged commit c3eac42 into bitsandbytes-foundation:multi-backend-refactor May 20, 2025
1 of 2 checks passed
@vivekgoe
Copy link

@matthewdouglas Thanks for your support in accepting Intel Gaudi (HPU) related PRs. I have a few questions regarding plans for additional devices support on "main",

  • Do you have a timeline for stopping development on "multi-backend-refactor" branch and moving to "main"?
  • Does it make sense for us to start adding Gaudi (HPU) changes on top of PyTorch Custom Operator Integration #1544 ? Or are there any other tasks pending with respect to Custom operators and multi-backend support that we should wait for.

@matthewdouglas
Copy link
Member

Hi @vivekgoe @rsshaik1

We're still working on timelines for Intel hardware support with @kding1, so it may be best to reach out to him directly to align on goals related to that. Otherwise what I can say is we've stopped working on the multi-backend-refactor branch for Intel CPUs and GPUs already and have new PRs for the main branch in #1628 and #1629.

We're going to keep the multi-backend-refactor wheels available for a while as there's likely users depending on them, but in #1644 I'm now updating the documentation to indicate that it's being deprecated.

In the next few days we will be pushing a v0.46.0 release, and then start merging new device support PRs on main for a target v0.47.0 release. It's hard to give any timelines on a stable v0.47.0 release, but at that point we'll still be building preview wheels for v0.47.0.dev0, so I think it would make sense to start building toward the custom ops implementation.

From what I can tell there's actually significant overlap with the work that @jiqing-feng is doing in #1628 as many of the op implementations for HPU appear to be implemented simply in PyTorch, the same as CPU/XPU. Many of those plain PyTorch ops are being registered now to the "default" dispatch key, meaning they would be used as the implementation for any device that does not override with its own implementation. I think the main change needed would be to register the ops for HPU that wrap around torch.ops.hpu or have other specialization needs.

We're coordinating on Intel development in a Slack channel also; if you feel that's appropriate we could invite Habana stakeholders there.

cc: @Titus-von-Koeller @christoph-koehncke

@vivekgoe
Copy link

@matthewdouglas Thanks for providing detailed information. You are right, for HPU we are ok with "default" for most ops, only op we need to register separately for HPU is dequantize_4bit. We will work on creating a PR to bring HPU changes to "main" branch.
If you have Slack channel admin rights, then please invite me "vivek.goel@intel.com". If it is a Slack channel created by Intel, then please let me know, I will work with @kding1 to add me. It will help in keeping HPU related code up to date with code for other Intel devices.

@Titus-von-Koeller
Copy link
Collaborator

@vivekgoe invite to Slack channel just went out

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants