Skip to content

Support for quantization of convolutional layers #1414

@JohnnyRacer

Description

@JohnnyRacer

Feature request

Would be great if the library could support more types of layers for quantization like with torch.ao, seems like it is already available with conv2d. Sadly torch.ao does not seem to support CUDA as a backend right now. Would it be possible to implement the 8-bit and 4-bit kernels in Triton or CUDA to allow for the quantization of convolutional layers? A similar issue has been raised here earlier.

Motivation

Make modules that use convolutional layers use less memory through quantization.

Your contribution

Yes, I am willing to work on implementing convolutional kernels if it is possible to integrate with this library.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions