-
-
Notifications
You must be signed in to change notification settings - Fork 791
Open
Labels
CUDAIssues and PRs related to the CUDA backend, excluding installation/support help.Issues and PRs related to the CUDA backend, excluding installation/support help.
Milestone
Description
See #1782 for background on this request.
We would like to add support to certain CUDA kernels/ops to handle overall tensor sizes > INT_MAX.
High priority ops:
- 4bit blockwise quantization and dequantization
- 4bit GEMV
- LLM.int8() quantization
- LLM.int8() matmul and dequantization
Medium priority ops:
- 8bit dynamic blockwise quantization
Low priority ops:
- Optimizers
davidmezzettidavidmezzetti
Metadata
Metadata
Assignees
Labels
CUDAIssues and PRs related to the CUDA backend, excluding installation/support help.Issues and PRs related to the CUDA backend, excluding installation/support help.