Skip to content

Exported ONNX model files are much larger than expected, compared to the ones created by an older version #2241

@whitphx

Description

@whitphx

System Info

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.5 LTS
Release:        22.04
Codename:       jammy
$ python -V
Python 3.12.6
$ uv pip freeze
accelerate==1.6.0
certifi==2025.1.31
charset-normalizer==3.4.1
coloredlogs==15.0.1
filelock==3.18.0
flatbuffers==25.2.10
fsspec==2025.3.2
huggingface-hub==0.30.2
humanfriendly==10.0
idna==3.10
jinja2==3.1.6
markupsafe==3.0.2
mpmath==1.3.0
networkx==3.4.2
numpy==2.2.5
nvidia-cublas-cu12==12.6.4.1
nvidia-cuda-cupti-cu12==12.6.80
nvidia-cuda-nvrtc-cu12==12.6.77
nvidia-cuda-runtime-cu12==12.6.77
nvidia-cudnn-cu12==9.5.1.17
nvidia-cufft-cu12==11.3.0.4
nvidia-cufile-cu12==1.11.1.6
nvidia-curand-cu12==10.3.7.77
nvidia-cusolver-cu12==11.7.1.2
nvidia-cusparse-cu12==12.5.4.2
nvidia-cusparselt-cu12==0.6.3
nvidia-nccl-cu12==2.26.2
nvidia-nvjitlink-cu12==12.6.85
nvidia-nvtx-cu12==12.6.77
onnx==1.17.0
onnxruntime==1.20.1
onnxslim==0.1.48
optimum @ git+https://github.com/huggingface/optimum.git@b04feaea78cda58d79b8da67dca3fd0c4ab33435
packaging==25.0
protobuf==6.30.2
psutil==7.0.0
pyyaml==6.0.2
regex==2024.11.6
requests==2.32.3
safetensors==0.5.3
setuptools==79.0.1
sympy==1.13.3
tokenizers==0.21.1
torch==2.7.0
tqdm==4.67.1
transformers==4.49.0
triton==3.3.0
typing-extensions==4.13.2
urllib3==2.4.0

Who can help?

@michaelbenayoun

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction (minimal, reproducible, runnable)

ONNX model files exported from optimum.exporters.onnx.main_export is larger than the ones created with the older version which @xenova suggested seems to be an issue around the weight deduplication step.
Reference: https://huggingface.co/Xenova/nllb-200-distilled-600M/discussions/3

I converted https://huggingface.co/facebook/nllb-200-distilled-600M for example.

  • onnx/decoder_model.onnx that was converted with the older version was 1860454885 bytes (~1.86GB)
  • Newly converted onnx/decoder_model.onnx_data is 2909290496 bytes (~2.91GB) with onnx/decoder_model.onnx in 430168 bytes (430kB).

Expected behavior

The converted model size should be similar.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions