Introducing the AMD-EfficientVLM (Vision-Language Model)

⚡️ This repository provides training recipes for the AMD efficient vision-language models, which are designed to improve the inference efficiency of VLM.

Getting Started

Prepare environment

You can use the following command to install necessary packages.

pip3 install -r requirements.txt
pip3 install flash-attn --no-build-isolation

Prepare training data

You can follow to LLaVA-OneVision to prepare the pre-training and the finetuning data.

Pre-training

You can run pre-training using scripts/pretrain.sh. The mm projector is pre-trianing during this stage. Please set the data_path and image_folder according to your data path.

bash scripts/pretrain.sh

Training

You can use follwing command to start training the model.

bash scripts/train.sh

Supervised Fine-tuning

You can use following command to start supervised fine-tuning the model.

bash scripts/sft.sh

Inference

We provide a simple script for inference with a single image input.

python3 llava/test_generate.py --model_path ./checkpoints/ --image_path ./images/dino.png --question "Please describe this image."

Call to Action

You are welcome to download and try this model. To get more information about the training, inferencing and insights of this model, please visit the AMD Hugging Face Model Card to get access to the codes, and to download the model file. Additionally, AMD opened a dedicated cloud infrastructure that includes latest GPU instances to AI developers. Visit AMD Developer Cloud for specific accessing request and usage. Furthermore, you can deploy advanced AI models on AMD Ryzen AI PCs and can learn more here.

For any questions, you may reach out to the AMD team at amd_ai_mkt@amd.com.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
llava		llava
scripts		scripts
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Introducing the AMD-EfficientVLM (Vision-Language Model)

Getting Started

Prepare environment

Prepare training data

Pre-training

Training

Supervised Fine-tuning

Inference

Call to Action

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

AMD-AGI/AMD-EfficientVLM

Folders and files

Latest commit

History

Repository files navigation

Introducing the AMD-EfficientVLM (Vision-Language Model)

Getting Started

Prepare environment

Prepare training data

Pre-training

Training

Supervised Fine-tuning

Inference

Call to Action

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages