ProfilingDiT

Official Implementation of ["Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models"]

📄Paper 🔗arXiv

This repository contains the official implementation of our paper: Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models.
Please follow the official link for setting up the environment.

🔥 Latest News

🔔 Latest News
• If you like our project, please give us a star ⭐ on GitHub for the latest update.
• [2025/04/04] 🎉 Submitted to arXiv for review.
• [2025/04/04] 🔥 Released open-source code for the latest model.

📀 Installation

Follow the official HunyuanVideo and WAN 2.1 environment setup guide.

pip install -r requirements.txt

🚀 Running the Code

HunyuanVideo

cd HunyuanVideo
python3 sample_video.py \
    --video-size 360 720 \
    --video-length 129 \
    --infer-steps 50 \
    --prompt "cat walk on grass" \
    --flow-reverse \
    --use-cpu-offload \
    --save-path ./results \
    --seed 42 \
    --model-base "ckpts" \
    --dit-weight "ckpts/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states.pt" \
    --delta_cache

WAN 2.1

cd Wan2.1
python generate.py \
    --task t2v-14B \
    --size 832*480 \
    --frame_num 81 \
    --prompt "Two anthropomorphic cats in comfy boxing gear and bright gloves fight intensely on a spotlighted stage." \
    --delta_cache

📊 Quantitative Comparison

HunyuanVideo Baseline

Method	VBench ↑	LPIPS ↓	PSNR ↑	SSIM ↑	FID ↓	Latency (ms) ↓	Speedup ↑
HunyuanVideo (720P, 129 frames)	0.7703	--	--	--	--	1745	--
TeaCache (slow) Tea	0.7700	0.1720	21.91	0.7456	77.67	1052	1.66×
TeaCache (fast) Tea	0.7677	0.1830	21.60	0.7323	83.85	753	2.31×
Ours (HunyuanVideo)	0.7642	0.1203	26.44	0.8445	41.10	932	1.87×

Wan2.1 Baseline

Method	VBench ↑	LPIPS ↓	PSNR ↑	SSIM ↑	FID ↓	Latency (ms) ↓	Speedup ↑
Wan2.1 (480P, 81 frames)	0.7582	--	--	--	--	497	--
TeaCache (0.2thres) Tea	0.7604	0.2913	16.17	0.5685	117.61	249	2.00×
Ours (Wan2.1)	0.7615	0.1256	22.02	0.7899	62.56	247	2.01×

Tables: Quantitative comparison with prior methods under HunyuanVideo and Wan2.1 baselines.
🔺 Higher is better for VBench, PSNR, SSIM, and Speedup.
🔻 Lower is better for LPIPS, FID, and Latency.

⚡ Scale to Multi-GPU

Our method efficiently scales across multiple GPUs to accelerate inference and training.
By leveraging model parallelism, NCCL communication, and optimized memory management, we achieve significant speedup without compromising quality.

🔑 Key Features:

Increased Throughput 🚀: Distributes computation across multiple GPUs to process more frames in parallel.
Optimized Memory Usage 🔧: Dynamically allocates memory to prevent bottlenecks.
Flexible Deployment 💡: Works seamlessly on both single-node and distributed setups.
NCCL Optimization 🔄: Uses efficient GPU-GPU communication to minimize overhead.

For detailed setup and configurations, please refer to our Multi-GPU Guide. 🚀

📝 To-Do List:

OpenSora2 🏗️ (Upcoming Support)
Optimize Caching for CogVideoX ⚙️

📚 Citation

@misc{ma2025modelrevealscacheprofilingbased,
      title={Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models}, 
      author={Xuran Ma and Yexin Liu and Yaofu Liu and Xianfeng Wu and Mingzhe Zheng and Zihao Wang and Ser-Nam Lim and Harry Yang},
      year={2025},
      eprint={2504.03140},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2504.03140}, 
}

📜 License

This project is licensed under the Apache 2.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
HunyuanVideo		HunyuanVideo
Wan2.1		Wan2.1
docs		docs
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ProfilingDiT

Official Implementation of ["Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models"]

📄Paper 🔗arXiv

📌 Table of Contents

🔥 Latest News

📀 Installation

🚀 Running the Code

HunyuanVideo

WAN 2.1

📊 Quantitative Comparison

HunyuanVideo Baseline

Wan2.1 Baseline

⚡ Scale to Multi-GPU

🔑 Key Features:

📝 To-Do List:

📚 Citation

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

GeekGuru123/ProfilingDiT

Folders and files

Latest commit

History

Repository files navigation

ProfilingDiT

Official Implementation of ["Model Reveals What to Cache: Profiling-Based Feature Reuse for Video Diffusion Models"]

📄Paper 🔗arXiv

📌 Table of Contents

🔥 Latest News

📀 Installation

🚀 Running the Code

HunyuanVideo

WAN 2.1

📊 Quantitative Comparison

HunyuanVideo Baseline

Wan2.1 Baseline

⚡ Scale to Multi-GPU

🔑 Key Features:

📝 To-Do List:

📚 Citation

📜 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages