how to enable  llama3-8b int4 awq models

Hi ,
    I got an auto-awq models (--wbits=4 --groupsize=128),and using command to run the ppl base on gpu card,  
--model /home/ubuntu/qllm_v0.2.0_Llama3-8B-Chinese-Chat_q4 --epochs 0 --eval_ppl --wbits 4 --abits 16 --lwc --net llama-7b
met an error when parse https://github.com/OpenGVLab/OmniQuant/blob/main/quantize/int_linear.py#L26 
seems QuantLinear define not support qweight for autoawq, Please have a check for the args, Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

how to enable llama3-8b int4 awq models #90

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

how to enable llama3-8b int4 awq models #90

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions