
- [2025.08.03]: 🔥🔥Release Hummingbird Image-to-Video Technical Report!
- [2025.07.30]: 🔥🔥Release pretrained Image-to-Video model and VSR model, and their training and inference code!
- [2025.03.24]: 🔥🔥Release AMD-Hummingbird: Towards an Efficient Text-to-Video Model Paper!
- [2025.02.28]: 🔥🔥Release Hummingbird Text-to-Video Technical Report!
- [2025.02.26]: 🔥🔥Release pretrained Text-to-Video models, training and inference code!
conda create -n AMD_Hummingbird python=3.10
conda activate AMD_Hummingbird
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/rocm6.1
pip install -r requirements.txt
For rocm flash-attn, you can install it by this link.
git clone https://github.com/ROCm/flash-attention.git
cd flash-attention
python setup.py install
It will take about 1.5 hours to install.
First, you should use docker pull
to download the image.
docker pull rocm/vllm:rocm6.2_mi300_ubuntu20.04_py3.9_vllm_0.6.4
Second, you can use docker run
to run the image, for example:
docker run \
-v "$(pwd):/workspace" \
--device=/dev/kfd \
--device=/dev/dri \
-it \
--network=host \
--name hummingbird \
rocm/vllm:rocm6.2_mi300_ubuntu20.04_py3.9_vllm_0.6.4
When you in the container, you can use pip
to install other dependencies:
pip install -r requirements.txt
Download the Unet pretrained checkpoint from Hummingbird-Text-to-Video. Run below command to generate videos:
# for 0.7B model
python inference_command_config_07B.py
# for 0.9B model
python inference_command_config_09B.py
Download the Image-to-Video pretrained checkpoint from Hummingbird-Image-to-Video. Run below command to generate videos:
cd i2v
sh run_hummingbird.sh
Download SR pretrained checkpoint from Hummingbird-Image-to-Video. Run below command to generate high-resolution videos:
cd VSR
sh inference_videos.sh
# VQA
cd data_pre_process/DOVER
sh run.sh
Then you can get a score table for all video qualities, sort according to the table, and remove low-scoring videos.
# Remove Dolly Zoom Videos
cd data_pre_process/VBench
sh run.sh
According to the motion smoothness score csv file, you can remove low-scoring videos.
cd acceleration/t2v-turbo
# for 0.7 B model
sh train_07B.sh
# for 0.9 B model
sh train_09B.sh
Firstly, you should train the Realesrnet model:
cd VSR
# for realesrnet model
sh train_realesrnet.sh
And you will get the trained checkpoint of Realesrnet, then you can train the Realesrgan model:
cd VSR
# for realesrgan model
sh train_realesrgan.sh
- Text-to-Video: Hummingbird-Text-to-Video
- Image-to-Video: Hummingbird-Image-to-Video
- Image/Video Super-Resolution: Hummingbird-SR
Please refer to the following blogs to get started with using these techniques on AMD GPUs:
- PyTorch Fully Sharded Data Parallel (FSDP) on AMD GPUs with ROCm™
- Accelerating Large Language Models with Flash Attention on AMD GPUs
- Accelerate PyTorch Models using torch.compile on AMD GPUs with ROCm™
- Introducing the First AMD 1B Language Models: AMD OLMo
Our codebase builds on VideoCrafter2, DynamicCrafter, T2v-Turbo, Real-ESRGAN .Thanks the authors for sharing their awesome codebases!
Feel free to cite our Hummingbird models and give us a star⭐, if you find our work helpful :)
@article{isobe2025amd,
title={AMD-Hummingbird: Towards an Efficient Text-to-Video Model},
author={Isobe, Takashi and Cui, He and Zhou, Dong and Ge, Mengmeng and Li, Dong and Barsoum, Emad},
journal={arXiv preprint arXiv:2503.18559},
year={2025}
}