Skip to content

Commit 6ec66e8

Browse files
authored
feat(diffusers/pipelines): add pipelines and required modules of QwenImage in Diffusers Master (#1288)
* 2025/08/15 * 2025/8/15 17:18 revised * 2025/8/18 10:22 revised * 2025/8/18 17:00 revised * 2025/8/18 19:08 revised * 2025/8/18 19:13 revised * 2025/8/19 9:02 revised * 2025/8/19 9:04 revised * 2025/8/19 9:12 revised * 2025/8/19 10:27 revised * 2025/8/20 9:22 revised * 2025/8/20 9:247 revised * 2025/8/20 9:48 revised * 2025/8/20 9:52 revised * 2025/8/20 10:15 revised * 2025/8/20 10:50 revised * 2025/8/20 11:11 revised * 2025/8/20 11:27 revised * 2025/8/20 11:47 revised * 2025/8/20 14:25 revised * 2025/8/20 14:26 revised * 2025/8/21 15:20 revised * 2025/8/21 15:24 revised * 2025/8/21 17:08 revised * 2025/8/21 17:57 revised * 2025/8/21 19:13 revised * 2025/8/22 11:32 revised * 2025/8/22 17:40 revised * 2025/8/25 10:40 revised * 2025/8/26 10:30 revised * 2025/8/26 17:10 revised * 2025/8/26 17:20 revised * 2025/8/27 14:08 revised * 2025/8/27 17:05 revised * 2025/8/27 17:09 revised * 2025/8/27 17:23 revised * 2025/8/29 15:42 revised * 2025/9/1 09:18 revised * 2025/9/1 09:40 revised * 2025/9/2 14:06, img2img infer * 2025/9/3 8:50, inpaint infer * 2025/9/3 14:07, img2img test * 2025/9/3 14:18, img2img test * 2025/9/3 16:30, inpaint test * 2025/9/4 14:21, edit bugs * 2025/9/4 15:58, edit ut * 2025/9/5 15:07, edit-inpaint pipe * 2025/9/5 17:40, fix some bugs * modified qwenimage 2025/9/15 clean no-use notes * 2025/9/15 seamless_m4t submit add model seamless_m4t * 2025/9/17 seamless_m4t ut * 25/9/17 seamless_m4t clean * 2025/9/17 qwenimage clean * fix: remove unwanted files * fix: remove unwanted files * fix: keep file consistent * fix: keep file consistent * fix: keep file consistent * revised according to gemini * revised according to gemini * fix conflicting according to gemini * fix conflicting according to gemini * required lines but conflicting * required lines but conflicting * fix: md, according to Cui-yshoho * fix a bug of qwen2_5_vl, some revisions suggested from Cui-yshoho and SamitHuang * Resolved the conflict regarding qwen2_5_vl masked_scatter-bf16-bug * Add UTs of transformer, supplement MDs, delete unused code comments * update md to notice the use of transformers==4.52.1 * fix ci problem * fix ci problem * fix ci problem * fix ci problem * CHECK: pre-commit run --all-files * fix ci problem - strange format? * Trigger CI * fix ci problem - modeling_reformer * fix: lora infer - lora_conversion_utils.py * revise format of some strings
1 parent 557322b commit 6ec66e8

29 files changed

+7718
-22
lines changed

docs/diffusers/_toctree.yml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -287,6 +287,8 @@
287287
title: PixArtTransformer2DModel
288288
- local: api/models/prior_transformer
289289
title: PriorTransformer
290+
- local: api/models/qwenimage_transformer2d
291+
title: QwenImageTransformer2DModel
290292
- local: api/models/sana_transformer2d
291293
title: SanaTransformer2DModel
292294
- local: api/models/sd3_transformer2d
@@ -339,6 +341,8 @@
339341
title: AutoencoderKLMagvit
340342
- local: api/models/autoencoderkl_mochi
341343
title: AutoencoderKLMochi
344+
- local: api/models/autoencoderkl_qwenimage
345+
title: AutoencoderKLQwenImage
342346
- local: api/models/autoencoder_kl_wan
343347
title: AutoencoderKLWan
344348
- local: api/models/consistency_decoder_vae
@@ -475,6 +479,8 @@
475479
title: PixArt-α
476480
- local: api/pipelines/pixart_sigma
477481
title: PixArt-Σ
482+
- local: api/pipelines/qwenimage
483+
title: QwenImage
478484
- local: api/pipelines/sana
479485
title: Sana
480486
- local: api/pipelines/sana_sprint

docs/diffusers/api/loaders/lora.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ LoRA is a fast and lightweight training method that inserts and trains a signifi
2727
- `WanLoraLoaderMixin` provides similar functions for [Wan](../../api/pipelines/wan.md).
2828
- `SkyReelsV2LoraLoaderMixin` provides similar functions for [SkyReels-V2](../../api/pipelines/skyreels_v2.md).
2929
- `AmusedLoraLoaderMixin` is for the [AmusedPipeline](../../api/pipelines/amused.md).
30+
- `QwenImageLoraLoaderMixin` provides similar functions for [QwenImage](../../api/pipelines/qwenimage.md)
3031
- `LoraBaseMixin` provides a base class with several utility methods to fuse, unfuse, unload, LoRAs and more.
3132

3233
!!! tip
@@ -60,4 +61,6 @@ LoRA is a fast and lightweight training method that inserts and trains a signifi
6061

6162
::: mindone.diffusers.loaders.lora_pipeline.AmusedLoraLoaderMixin
6263

64+
::: mindone.diffusers.loaders.lora_pipeline.QwenImageLoraLoaderMixin
65+
6366
::: mindone.diffusers.loaders.lora_base.LoraBaseMixin
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# AutoencoderKLQwenImage
13+
14+
The model can be loaded with the following code snippet.
15+
16+
```python
17+
from mindone.diffusers import AutoencoderKLQwenImage
18+
19+
vae = AutoencoderKLQwenImage.from_pretrained("Qwen/QwenImage", subfolder="vae")
20+
```
21+
22+
::: mindone.diffusers.AutoencoderKLQwenImage
23+
24+
::: mindone.diffusers.models.autoencoders.autoencoder_kl.AutoencoderKLOutput
25+
26+
::: mindone.diffusers.models.autoencoders.vae.DecoderOutput
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# QwenImageTransformer2DModel
13+
14+
The model can be loaded with the following code snippet.
15+
16+
```python
17+
from mindone.diffusers import QwenImageTransformer2DModel
18+
19+
transformer = QwenImageTransformer2DModel.from_pretrained("Qwen/QwenImage", subfolder="transformer", mindspore_dtype=mindspore.bfloat16)
20+
```
21+
22+
::: mindone.diffusers.QwenImageTransformer2DModel
23+
24+
::: mindone.diffusers.models.modeling_outputs.Transformer2DModelOutput
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License. -->
14+
15+
# QwenImage
16+
17+
<div class="flex flex-wrap space-x-1">
18+
<img alt="LoRA" src="https://img.shields.io/badge/LoRA-d8b4fe?style=flat"/>
19+
</div>
20+
21+
Qwen-Image from the Qwen team is an image generation foundation model in the Qwen series that achieves significant advances in complex text rendering and precise image editing. Experiments show strong general capabilities in both image generation and editing, with exceptional performance in text rendering, especially for Chinese.
22+
23+
Qwen-Image comes in the following variants:
24+
25+
| model type | model id |
26+
|:----------:|:--------:|
27+
| Qwen-Image | [`Qwen/Qwen-Image`](https://huggingface.co/Qwen/Qwen-Image) |
28+
| Qwen-Image-Edit | [`Qwen/Qwen-Image-Edit`](https://huggingface.co/Qwen/Qwen-Image-Edit) |
29+
30+
!!! tip
31+
32+
Make sure to check out the Schedulers [guide](../../using-diffusers/schedulers) to learn how to explore the tradeoff between scheduler speed and quality, and see the [reuse components across pipelines](../../using-diffusers/loading#reuse-a-pipeline) section to learn how to efficiently load the same components into multiple pipelines.
33+
In addition, the default version of installed `transformers` in `mindone` is `4.50.0`, but `transformers==4.52.1` is required for Qwen-Image. Please using `pip install transformers==4.52.1` to upgrade, if you want to try related Qwen-Image pipelines.
34+
35+
36+
::: mindone.diffusers.QwenImagePipeline
37+
38+
::: mindone.diffusers.pipelines.qwenimage.pipeline_output.QwenImagePipelineOutput
39+
40+
::: mindone.diffusers.QwenImageImg2ImgPipeline
41+
42+
::: mindone.diffusers.QwenImageInpaintPipeline

mindone/diffusers/__init__.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,7 @@
6565
"AutoencoderKLLTXVideo",
6666
"AutoencoderKLMagvit",
6767
"AutoencoderKLMochi",
68+
"AutoencoderKLQwenImage",
6869
"AutoencoderKLTemporalDecoder",
6970
"AutoencoderKLWan",
7071
"AutoencoderOobleck",
@@ -105,6 +106,7 @@
105106
"OmniGenTransformer2DModel",
106107
"PixArtTransformer2DModel",
107108
"PriorTransformer",
109+
"QwenImageTransformer2DModel",
108110
"SanaControlNetModel",
109111
"SanaTransformer2DModel",
110112
"SD3ControlNetModel",
@@ -261,6 +263,11 @@
261263
"PixArtAlphaPipeline",
262264
"PixArtSigmaPAGPipeline",
263265
"PixArtSigmaPipeline",
266+
"QwenImageImg2ImgPipeline",
267+
"QwenImageInpaintPipeline",
268+
"QwenImagePipeline",
269+
"QwenImageEditPipeline",
270+
"QwenImageEditInpaintPipeline",
264271
"ReduxImageEncoder",
265272
"SanaControlNetPipeline",
266273
"SanaPAGPipeline",
@@ -439,6 +446,7 @@
439446
AutoencoderKLLTXVideo,
440447
AutoencoderKLMagvit,
441448
AutoencoderKLMochi,
449+
AutoencoderKLQwenImage,
442450
AutoencoderKLTemporalDecoder,
443451
AutoencoderKLWan,
444452
AutoencoderOobleck,
@@ -479,6 +487,7 @@
479487
OmniGenTransformer2DModel,
480488
PixArtTransformer2DModel,
481489
PriorTransformer,
490+
QwenImageTransformer2DModel,
482491
SanaControlNetModel,
483492
SanaTransformer2DModel,
484493
SD3ControlNetModel,
@@ -646,6 +655,11 @@
646655
PixArtAlphaPipeline,
647656
PixArtSigmaPAGPipeline,
648657
PixArtSigmaPipeline,
658+
QwenImageEditInpaintPipeline,
659+
QwenImageEditPipeline,
660+
QwenImageImg2ImgPipeline,
661+
QwenImageInpaintPipeline,
662+
QwenImagePipeline,
649663
ReduxImageEncoder,
650664
SanaControlNetPipeline,
651665
SanaPAGPipeline,

mindone/diffusers/loaders/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,7 @@ def text_encoder_attn_modules(text_encoder):
7373
"CogView4LoraLoaderMixin",
7474
"Mochi1LoraLoaderMixin",
7575
"HunyuanVideoLoraLoaderMixin",
76+
"QwenImageLoraLoaderMixin",
7677
"SanaLoraLoaderMixin",
7778
"Lumina2LoraLoaderMixin",
7879
"WanLoraLoaderMixin",
@@ -100,6 +101,7 @@ def text_encoder_attn_modules(text_encoder):
100101
LTXVideoLoraLoaderMixin,
101102
Lumina2LoraLoaderMixin,
102103
Mochi1LoraLoaderMixin,
104+
QwenImageLoraLoaderMixin,
103105
SanaLoraLoaderMixin,
104106
SD3LoraLoaderMixin,
105107
SkyReelsV2LoraLoaderMixin,

mindone/diffusers/loaders/lora_conversion_utils.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2397,8 +2397,8 @@ def get_alpha_scales(down_weight, alpha_key):
23972397
down_weight = state_dict.pop(k)
23982398
up_weight = state_dict.pop(k.replace(down_key, up_key))
23992399
scale_down, scale_up = get_alpha_scales(down_weight, alpha_key)
2400-
converted_state_dict[diffusers_down_key] = down_weight * scale_down
2401-
converted_state_dict[diffusers_up_key] = up_weight * scale_up
2400+
converted_state_dict[diffusers_down_key] = Parameter(down_weight * scale_down)
2401+
converted_state_dict[diffusers_up_key] = Parameter(up_weight * scale_up)
24022402

24032403
if len(state_dict) > 0:
24042404
raise ValueError(f"`state_dict` should be empty at this point but has {state_dict.keys()=}")

0 commit comments

Comments
 (0)