Skip to content

Support quantizing tensors when numel() > INT_MAX #1785

@matthewdouglas

Description

@matthewdouglas

See #1782 for background on this request.

We would like to add support to certain CUDA kernels/ops to handle overall tensor sizes > INT_MAX.

High priority ops:

  • 4bit blockwise quantization and dequantization
  • 4bit GEMV
  • LLM.int8() quantization
  • LLM.int8() matmul and dequantization

Medium priority ops:

  • 8bit dynamic blockwise quantization

Low priority ops:

  • Optimizers

Metadata

Metadata

Labels

CUDAIssues and PRs related to the CUDA backend, excluding installation/support help.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions