Skip to content

AMD-AGI/Hummingbird

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation



Hummingbird: A Lightweight, High-Performance Video Generation Model

🔆Introduction

⚡️ Hummingbird is a series of advanced video generation models developed by the AMD AIG team and trained on AMD Instinct™ MI250 GPUs. It includes text-to-video models, image-to-video models, and image/video super-resolution models. With only 0.9B parameters, the Hummingbird model demonstrates exceptional efficiency. For text-to-video tasks, it can generate text-aligned videos in just 1.87 seconds using 4 steps on an MI250 GPU. For image-to-video tasks, it takes only 11 seconds to produce high-quality 4K videos.
Figure 1: AMD Hummingbird-0.9B Visual Performance Comparison with Stat-of-the-art T2V Models on Vbench.
A cute happy Corgi playing in park, sunset, pixel. A cute happy Corgi playing in park, sunset, animated style. A cute raccoon playing guitar in the beach.   A cute raccoon playing guitar in the forest.
A quiet beach at dawn and the waves gently lapping. A cute teddy bear, dressed in a red silk outfit, stands in a vibrant street, chinese new year. A sandcastle being eroded by the incoming tide. An astronaut flying in space, in cyberpunk style.
A cat DJ at a party. A 3D model of a 1800s victorian house. A drone flying over a snowy forest. A ghost ship navigating through a sea under a moon.

📝 Change Log

🚀Getting Started

Installation

Conda

conda create -n AMD_Hummingbird python=3.10
conda activate AMD_Hummingbird
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/rocm6.1
pip install -r requirements.txt

For rocm flash-attn, you can install it by this link.

git clone https://github.com/ROCm/flash-attention.git
cd flash-attention
python setup.py install

It will take about 1.5 hours to install.

Docker

First, you should use docker pull to download the image.

docker pull rocm/vllm:rocm6.2_mi300_ubuntu20.04_py3.9_vllm_0.6.4

Second, you can use docker run to run the image, for example:

docker run \
        -v "$(pwd):/workspace" \
        --device=/dev/kfd \
        --device=/dev/dri \
        -it \
        --network=host \
        --name hummingbird \
        rocm/vllm:rocm6.2_mi300_ubuntu20.04_py3.9_vllm_0.6.4

When you in the container, you can use pip to install other dependencies:

pip install -r requirements.txt

Example Usage

Text-to-Video

Download the Unet pretrained checkpoint from Hummingbird-Text-to-Video. Run below command to generate videos:

# for 0.7B model
python inference_command_config_07B.py

# for 0.9B model
python inference_command_config_09B.py

Image-to-Video

Download the Image-to-Video pretrained checkpoint from Hummingbird-Image-to-Video. Run below command to generate videos:

cd i2v
sh run_hummingbird.sh

Image/Video Super-Resolution

Download SR pretrained checkpoint from Hummingbird-Image-to-Video. Run below command to generate high-resolution videos:

cd VSR
sh inference_videos.sh

💥Pre-training

Data Preparation

# VQA
cd data_pre_process/DOVER
sh run.sh

Then you can get a score table for all video qualities, sort according to the table, and remove low-scoring videos.

# Remove Dolly Zoom Videos
cd data_pre_process/VBench
sh run.sh 

According to the motion smoothness score csv file, you can remove low-scoring videos.

Training

Text-to-video

cd acceleration/t2v-turbo

# for 0.7 B model
sh train_07B.sh

# for 0.9 B model
sh train_09B.sh

Image/Video Super-Resolution

Firstly, you should train the Realesrnet model:

cd VSR
# for realesrnet model
sh train_realesrnet.sh

And you will get the trained checkpoint of Realesrnet, then you can train the Realesrgan model:

cd VSR
# for realesrgan model
sh train_realesrgan.sh

🤗Resources

Pre-trained models

AMD Blogs

Please refer to the following blogs to get started with using these techniques on AMD GPUs:

❤️Acknowledgement

Our codebase builds on VideoCrafter2, DynamicCrafter, T2v-Turbo, Real-ESRGAN .Thanks the authors for sharing their awesome codebases!

📋Citations

Feel free to cite our Hummingbird models and give us a star⭐, if you find our work helpful :)

@article{isobe2025amd,
  title={AMD-Hummingbird: Towards an Efficient Text-to-Video Model},
  author={Isobe, Takashi and Cui, He and Zhou, Dong and Ge, Mengmeng and Li, Dong and Barsoum, Emad},
  journal={arXiv preprint arXiv:2503.18559},
  year={2025}
}

About

AMD 0.9B efficient text to video diffusion model

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •