Skip to content

stefbil/post-training-optimization-cli-tool

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

Model Compression and ONNX Conversion

Onnx convertion is being investigated due to issues.

This code allows you to optimize a pretrained VisionEncoderDecoderModel from Hugging Face Transformers.

Prerequisites

  • A GPU with CUDA support is recommended to load the VisionTransformer models
  • Python 3.10.6
  • CUDA Drivers

Usage

Prepare and activate a python environment with:

python -m venv optimenv

Install required packaged:

pip install -r requirements.txt

The code can be run with:

python cli.py --load PATH_TO_MODEL --save PATH_TO_SAVE  --gpt2

The following arguments are available:

  • --gpt2: Indicated the model that's gonna get optimized
  • --load: Path to pretrained model. Can be a Hugging Face model link or local path.
  • --save: Path to save the compressed model files.

For example:

python cli.py --gpt2 --load nlpconnect/vit-gpt2-image-captioning --save PATH_TO_SAVE

or

python cli.py -gpt2 -l nlpconnect/vit-gpt2-image-captioning -s PATH_TO_SAVE

This will compress the model and save it to PATH_TO_SAVE. It will also convert the model to ONNX format and save it to PATH_TO_SAVE/onnx.

Model Compression

The model compression first loads the VisionEncoderDecoderModel and converts it to fp16 to reduce the size.

It then saves the compressed model in the binary .bin format to the specified save path.

The compression typically reduces the model size by 2-4x.

About

Optimize a pretrained VisionEncoderDecoderModel from Hugging Face Transformers

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages