Support block size of 32

### Feature request

Currently, bnb only supports block size of 64 or above. It would be great if it can support block size of 32, like llama.cpp

### Motivation

Better output quality after quantization. 

### Your contribution

It seems like this change will not be straight-forward, as each cuda wrap expect 32 threads and the current implementation will result in 1 element per-thread, which prevents the implementation from packing 2 4-bit quants into 8bit. I currently don't know how to solve this. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support block size of 32 #986

Feature request

Motivation

Your contribution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support block size of 32 #986

Description

Feature request

Motivation

Your contribution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions