Open
Description
Feature request
Would be great if the library could support more types of layers for quantization like with torch.ao
, seems like it is already available with conv2d. Sadly torch.ao
does not seem to support CUDA as a backend right now. Would it be possible to implement the 8-bit and 4-bit kernels in Triton or CUDA to allow for the quantization of convolutional layers? A similar issue has been raised here earlier.
Motivation
Make modules that use convolutional layers use less memory through quantization.
Your contribution
Yes, I am willing to work on implementing convolutional kernels if it is possible to integrate with this library.