[Question]Regarding the problem of generating two models in different formats after using Optimum

- [x] The question is written in English (I can translate what you say, but questions written in English are easier to read for the other users).

Write here the question.
I used this command and generated two formats of files.
**********
optimum-cli export onnx --model {MODEL_ID} --task 'text2text-generation-with-past' --no-post-process {ONNX_BASE_DIR}
**********
 I understand this situation. I used the following code directly for quantization, but I couldn't get a scale as small as NLLB_cache_initializer.onnx（separated the component of the decoder without past that generate the kv-cache of the encoder part ([NLLB_cache_initializer.onnx](https://github.com/niedev/RTranslator/releases/tag/2.0.0))
##code
quantize_dynamic(
    model_input="/Users/zhangboxu/workspace/nllb/nllb_onnx/decoder/decoder_model.onnx",
    model_output="/Users/zhangboxu/workspace/nllb/nllb_onnx/decoder/decoder_model_int8.onnx",
    weight_type=QuantType.QInt8 
)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Question]Regarding the problem of generating two models in different formats after using Optimum #144

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

[Question]Regarding the problem of generating two models in different formats after using Optimum #144

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions