Hummingbird: A Lightweight, High-Performance Video Generation Model

🔆Introduction

⚡️ Hummingbird is a series of advanced video generation models developed by the AMD AIG team and trained on AMD Instinct™ MI250 GPUs. It includes text-to-video models, image-to-video models, and image/video super-resolution models. With only 0.9B parameters, the Hummingbird model demonstrates exceptional efficiency. For text-to-video tasks, it can generate text-aligned videos in just 1.87 seconds using 4 steps on an MI250 GPU. For image-to-video tasks, it takes only 11 seconds to produce high-quality 4K videos.

Figure 1: AMD Hummingbird-0.9B Visual Performance Comparison with Stat-of-the-art T2V Models on Vbench.

A cute happy Corgi playing in park, sunset, pixel.	A cute happy Corgi playing in park, sunset, animated style.	A cute raccoon playing guitar in the beach.	A cute raccoon playing guitar in the forest.

A quiet beach at dawn and the waves gently lapping.	A cute teddy bear, dressed in a red silk outfit, stands in a vibrant street, chinese new year.	A sandcastle being eroded by the incoming tide.	An astronaut flying in space, in cyberpunk style.

A cat DJ at a party.	A 3D model of a 1800s victorian house.	A drone flying over a snowy forest.	A ghost ship navigating through a sea under a moon.

📝 Change Log

[2025.08.03]: 🔥🔥Release Hummingbird Image-to-Video Technical Report!
[2025.07.30]: 🔥🔥Release pretrained Image-to-Video model and VSR model, and their training and inference code!
[2025.03.24]: 🔥🔥Release AMD-Hummingbird: Towards an Efficient Text-to-Video Model Paper!
[2025.02.28]: 🔥🔥Release Hummingbird Text-to-Video Technical Report!
[2025.02.26]: 🔥🔥Release pretrained Text-to-Video models, training and inference code!

🚀Getting Started

Installation

Conda

conda create -n AMD_Hummingbird python=3.10
conda activate AMD_Hummingbird
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/rocm6.1
pip install -r requirements.txt

For rocm flash-attn, you can install it by this link.

git clone https://github.com/ROCm/flash-attention.git
cd flash-attention
python setup.py install

It will take about 1.5 hours to install.

Docker

First, you should use docker pull to download the image.

docker pull rocm/vllm:rocm6.2_mi300_ubuntu20.04_py3.9_vllm_0.6.4

Second, you can use docker run to run the image, for example:

docker run \
        -v "$(pwd):/workspace" \
        --device=/dev/kfd \
        --device=/dev/dri \
        -it \
        --network=host \
        --name hummingbird \
        rocm/vllm:rocm6.2_mi300_ubuntu20.04_py3.9_vllm_0.6.4

When you in the container, you can use pip to install other dependencies:

pip install -r requirements.txt

Example Usage

Text-to-Video

Download the Unet pretrained checkpoint from Hummingbird-Text-to-Video. Run below command to generate videos:

# for 0.7B model
python inference_command_config_07B.py

# for 0.9B model
python inference_command_config_09B.py

Image-to-Video

Download the Image-to-Video pretrained checkpoint from Hummingbird-Image-to-Video. Run below command to generate videos:

cd i2v
sh run_hummingbird.sh

Image/Video Super-Resolution

Download SR pretrained checkpoint from Hummingbird-Image-to-Video. Run below command to generate high-resolution videos:

cd VSR
sh inference_videos.sh

💥Pre-training

Data Preparation

# VQA
cd data_pre_process/DOVER
sh run.sh

Then you can get a score table for all video qualities, sort according to the table, and remove low-scoring videos.

# Remove Dolly Zoom Videos
cd data_pre_process/VBench
sh run.sh

According to the motion smoothness score csv file, you can remove low-scoring videos.

Training

Text-to-video

cd acceleration/t2v-turbo

# for 0.7 B model
sh train_07B.sh

# for 0.9 B model
sh train_09B.sh

Image/Video Super-Resolution

Firstly, you should train the Realesrnet model:

cd VSR
# for realesrnet model
sh train_realesrnet.sh

And you will get the trained checkpoint of Realesrnet, then you can train the Realesrgan model:

cd VSR
# for realesrgan model
sh train_realesrgan.sh

🤗Resources

Pre-trained models

Text-to-Video: Hummingbird-Text-to-Video
Image-to-Video: Hummingbird-Image-to-Video
Image/Video Super-Resolution: Hummingbird-SR

AMD Blogs

Please refer to the following blogs to get started with using these techniques on AMD GPUs:

❤️Acknowledgement

Our codebase builds on VideoCrafter2, DynamicCrafter, T2v-Turbo, Real-ESRGAN .Thanks the authors for sharing their awesome codebases!

📋Citations

Feel free to cite our Hummingbird models and give us a star⭐, if you find our work helpful :)

@article{isobe2025amd,
  title={AMD-Hummingbird: Towards an Efficient Text-to-Video Model},
  author={Isobe, Takashi and Cui, He and Zhou, Dong and Ge, Mengmeng and Li, Dong and Barsoum, Emad},
  journal={arXiv preprint arXiv:2503.18559},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
GIFs		GIFs
VSR		VSR
i2v		i2v
t2v		t2v
LICENSE		LICENSE
NOTICES		NOTICES
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Hummingbird: A Lightweight, High-Performance Video Generation Model

🔆Introduction

📝 Change Log

🚀Getting Started

Installation

Conda

Docker

Example Usage

Text-to-Video

Image-to-Video

Image/Video Super-Resolution

💥Pre-training

Data Preparation

Training

Text-to-video

Image/Video Super-Resolution

🤗Resources

Pre-trained models

AMD Blogs

❤️Acknowledgement

📋Citations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

AMD-AGI/Hummingbird

Folders and files

Latest commit

History

Repository files navigation

Hummingbird: A Lightweight, High-Performance Video Generation Model

🔆Introduction

📝 Change Log

🚀Getting Started

Installation

Conda

Docker

Example Usage

Text-to-Video

Image-to-Video

Image/Video Super-Resolution

💥Pre-training

Data Preparation

Training

Text-to-video

Image/Video Super-Resolution

🤗Resources

Pre-trained models

AMD Blogs

❤️Acknowledgement

📋Citations

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages