Skip to content

Wrong result 8bit blockwise quantization over float16 #1540

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ZiyueXu77 opened this issue Feb 25, 2025 · 4 comments · Fixed by #1544
Open

Wrong result 8bit blockwise quantization over float16 #1540

ZiyueXu77 opened this issue Feb 25, 2025 · 4 comments · Fixed by #1544
Labels
Bug Something isn't working High Priority (first issues that will be worked on) Low Risk Risk of bugs in transformers and other libraries x64 CPU
Milestone

Comments

@ZiyueXu77
Copy link

System Info

Ubuntu 24.04

Reproduction

The following simple script, will yield all 0 dequantized results for all 1 inputs, and Process finished with exit code 139 (interrupted by signal 11:SIGSEGV)
`
import torch
from bitsandbytes.functional import quantize_blockwise, dequantize_blockwise

if name == "main":

test_torch = torch.ones([5, 10], dtype=torch.float16)

quantized, quantized_state = quantize_blockwise(test_torch)
absmax = quantized_state.absmax
code = quantized_state.code

dequantized = dequantize_blockwise(quantized, absmax=absmax, code=code)
print(dequantized)

dequantized = dequantize_blockwise(quantized, quant_state=quantized_state)
print(dequantized)

`

Expected behavior

torch.float32 will give correct results of all 1s with Process finished with exit code 0

@matthewdouglas matthewdouglas added the Bug Something isn't working label Feb 25, 2025
@matthewdouglas
Copy link
Member

Hi,
On cpu only torch.float32 is currently supported. It seems like we do not guard or warn against this.

@ZiyueXu77
Copy link
Author

I see, thanks for the prompt response! will add the constraints in my pipeline

ZiyueXu77 added a commit to NVIDIA/NVFlare that referenced this issue Feb 27, 2025
Fixes # .

### Description

Account for QA finding and [bug from bitsandbytes
](bitsandbytes-foundation/bitsandbytes#1540)
Also add info to supported precisions

### Types of changes
<!--- Put an `x` in all the boxes that apply, and remove the not
applicable items -->
- [x] Non-breaking change (fix or new feature that would not break
existing functionality).
- [ ] Breaking change (fix or new feature that would cause existing
functionality to change).
- [ ] New tests added to cover the changes.
- [ ] Quick tests passed locally by running `./runtest.sh`.
- [ ] In-line docstrings updated.
- [ ] Documentation updated.
@TimDettmers TimDettmers added High Priority (first issues that will be worked on) Low Risk Risk of bugs in transformers and other libraries labels Feb 28, 2025
@matthewdouglas matthewdouglas added this to the v0.47.0 milestone Mar 5, 2025
@ved1beta
Copy link
Contributor

is this issue about adding warnings or adding float32 support ?

@matthewdouglas
Copy link
Member

@ved1beta It's about adding support which will come with #1628 (targeting v0.47.0 release).

In #1544 we start to raise a RuntimeError and that will be in the v0.46.0 release:

RuntimeError: A must be float32 on cpu, got torch.float16

@matthewdouglas matthewdouglas linked a pull request May 22, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working High Priority (first issues that will be worked on) Low Risk Risk of bugs in transformers and other libraries x64 CPU
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants