Skip to content

[Question]Regarding the problem of generating two models in different formats after using Optimum #144

@Geministudents

Description

@Geministudents
  • The question is written in English (I can translate what you say, but questions written in English are easier to read for the other users).

Write here the question.
I used this command and generated two formats of files.


optimum-cli export onnx --model {MODEL_ID} --task 'text2text-generation-with-past' --no-post-process {ONNX_BASE_DIR}


I understand this situation. I used the following code directly for quantization, but I couldn't get a scale as small as NLLB_cache_initializer.onnx(separated the component of the decoder without past that generate the kv-cache of the encoder part (NLLB_cache_initializer.onnx)
##code
quantize_dynamic(
model_input="/Users/zhangboxu/workspace/nllb/nllb_onnx/decoder/decoder_model.onnx",
model_output="/Users/zhangboxu/workspace/nllb/nllb_onnx/decoder/decoder_model_int8.onnx",
weight_type=QuantType.QInt8
)

Metadata

Metadata

Assignees

Labels

questionFurther information is requested

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions