Skip to content

Supporting inference with EETQ quantized model #391

@thincal

Description

@thincal

Feature request

EETQ quantized model perform with very good quality in my case, but the loading is pretty slow. So that if the base model is quantized with EETQ already, LoRAX should load it directly without the JIT quantization, but currently will failed to find related layers.

Motivation

Speed up the EETQ model loading speed.

Your contribution

I will prepare a PR for a review, also I need some help with the implementation in someplace.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions