Skip to content

Commit 8a63aa5

Browse files
yiyixuxulawrence-cjgithub-actions[bot]sayakpaula-r-r-o-w
authored
add sana-sprint (#11074)
* add sana-sprint --------- Co-authored-by: Junsong Chen <cjs1020440147@icloud.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> Co-authored-by: Aryan <aryan@huggingface.co>
1 parent 844221a commit 8a63aa5

File tree

15 files changed

+1995
-123
lines changed

15 files changed

+1995
-123
lines changed

docs/source/en/_toctree.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -496,6 +496,8 @@
496496
title: PixArt-Σ
497497
- local: api/pipelines/sana
498498
title: Sana
499+
- local: api/pipelines/sana_sprint
500+
title: Sana Sprint
499501
- local: api/pipelines/self_attention_guidance
500502
title: Self-Attention Guidance
501503
- local: api/pipelines/semantic_stable_diffusion
Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
<!-- Copyright 2024 The HuggingFace Team. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License. -->
14+
15+
# SanaSprintPipeline
16+
17+
<div class="flex flex-wrap space-x-1">
18+
<img alt="LoRA" src="https://img.shields.io/badge/LoRA-d8b4fe?style=flat"/>
19+
</div>
20+
21+
[SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation](https://huggingface.co/papers/2503.09641) from NVIDIA, MIT HAN Lab, and Hugging Face by Junsong Chen, Shuchen Xue, Yuyang Zhao, Jincheng Yu, Sayak Paul, Junyu Chen, Han Cai, Enze Xie, Song Han
22+
23+
The abstract from the paper is:
24+
25+
*This paper presents SANA-Sprint, an efficient diffusion model for ultra-fast text-to-image (T2I) generation. SANA-Sprint is built on a pre-trained foundation model and augmented with hybrid distillation, dramatically reducing inference steps from 20 to 1-4. We introduce three key innovations: (1) We propose a training-free approach that transforms a pre-trained flow-matching model for continuous-time consistency distillation (sCM), eliminating costly training from scratch and achieving high training efficiency. Our hybrid distillation strategy combines sCM with latent adversarial distillation (LADD): sCM ensures alignment with the teacher model, while LADD enhances single-step generation fidelity. (2) SANA-Sprint is a unified step-adaptive model that achieves high-quality generation in 1-4 steps, eliminating step-specific training and improving efficiency. (3) We integrate ControlNet with SANA-Sprint for real-time interactive image generation, enabling instant visual feedback for user interaction. SANA-Sprint establishes a new Pareto frontier in speed-quality tradeoffs, achieving state-of-the-art performance with 7.59 FID and 0.74 GenEval in only 1 step — outperforming FLUX-schnell (7.94 FID / 0.71 GenEval) while being 10× faster (0.1s vs 1.1s on H100). It also achieves 0.1s (T2I) and 0.25s (ControlNet) latency for 1024×1024 images on H100, and 0.31s (T2I) on an RTX 4090, showcasing its exceptional efficiency and potential for AI-powered consumer applications (AIPC). Code and pre-trained models will be open-sourced.*
26+
27+
<Tip>
28+
29+
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
30+
31+
</Tip>
32+
33+
This pipeline was contributed by [lawrence-cj](https://github.com/lawrence-cj), [shuchen Xue](https://github.com/scxue) and [Enze Xie](https://github.com/xieenze). The original codebase can be found [here](https://github.com/NVlabs/Sana). The original weights can be found under [hf.co/Efficient-Large-Model](https://huggingface.co/Efficient-Large-Model/).
34+
35+
Available models:
36+
37+
| Model | Recommended dtype |
38+
|:-------------------------------------------------------------------------------------------------------------------------------------------:|:-----------------:|
39+
| [`Efficient-Large-Model/Sana_Sprint_1.6B_1024px_diffusers`](https://huggingface.co/Efficient-Large-Model/Sana_Sprint_1.6B_1024px_diffusers) | `torch.bfloat16` |
40+
| [`Efficient-Large-Model/Sana_Sprint_0.6B_1024px_diffusers`](https://huggingface.co/Efficient-Large-Model/Sana_Sprint_0.6B_1024px_diffusers) | `torch.bfloat16` |
41+
42+
Refer to [this](https://huggingface.co/collections/Efficient-Large-Model/sana-sprint-67d6810d65235085b3b17c76) collection for more information.
43+
44+
Note: The recommended dtype mentioned is for the transformer weights. The text encoder must stay in `torch.bfloat16` and VAE weights must stay in `torch.bfloat16` or `torch.float32` for the model to work correctly. Please refer to the inference example below to see how to load the model with the recommended dtype.
45+
46+
47+
## Quantization
48+
49+
Quantization helps reduce the memory requirements of very large models by storing model weights in a lower precision data type. However, quantization may have varying impact on video quality depending on the video model.
50+
51+
Refer to the [Quantization](../../quantization/overview) overview to learn more about supported quantization backends and selecting a quantization backend that supports your use case. The example below demonstrates how to load a quantized [`SanaSprintPipeline`] for inference with bitsandbytes.
52+
53+
```py
54+
import torch
55+
from diffusers import BitsAndBytesConfig as DiffusersBitsAndBytesConfig, SanaTransformer2DModel, SanaSprintPipeline
56+
from transformers import BitsAndBytesConfig as BitsAndBytesConfig, AutoModel
57+
58+
quant_config = BitsAndBytesConfig(load_in_8bit=True)
59+
text_encoder_8bit = AutoModel.from_pretrained(
60+
"Efficient-Large-Model/Sana_Sprint_1.6B_1024px_diffusers",
61+
subfolder="text_encoder",
62+
quantization_config=quant_config,
63+
torch_dtype=torch.bfloat16,
64+
)
65+
66+
quant_config = DiffusersBitsAndBytesConfig(load_in_8bit=True)
67+
transformer_8bit = SanaTransformer2DModel.from_pretrained(
68+
"Efficient-Large-Model/Sana_Sprint_1.6B_1024px_diffusers",
69+
subfolder="transformer",
70+
quantization_config=quant_config,
71+
torch_dtype=torch.bfloat16,
72+
)
73+
74+
pipeline = SanaSprintPipeline.from_pretrained(
75+
"Efficient-Large-Model/Sana_Sprint_1.6B_1024px_diffusers",
76+
text_encoder=text_encoder_8bit,
77+
transformer=transformer_8bit,
78+
torch_dtype=torch.bfloat16,
79+
device_map="balanced",
80+
)
81+
82+
prompt = "a tiny astronaut hatching from an egg on the moon"
83+
image = pipeline(prompt).images[0]
84+
image.save("sana.png")
85+
```
86+
87+
## Setting `max_timesteps`
88+
89+
Users can tweak the `max_timesteps` value for experimenting with the visual quality of the generated outputs. The default `max_timesteps` value was obtained with an inference-time search process. For more details about it, check out the paper.
90+
91+
## SanaSprintPipeline
92+
93+
[[autodoc]] SanaSprintPipeline
94+
- all
95+
- __call__
96+
97+
98+
## SanaPipelineOutput
99+
100+
[[autodoc]] pipelines.sana.pipeline_output.SanaPipelineOutput

0 commit comments

Comments
 (0)