Skip to content

Commit 99e56e3

Browse files
committed
Merge branch 'main' into add-wan2.2-animate-pipeline
2 parents 332d3c2 + ac5a1e2 commit 99e56e3

File tree

100 files changed

+9685
-383
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

100 files changed

+9685
-383
lines changed

docs/source/en/_toctree.yml

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -323,6 +323,8 @@
323323
title: AllegroTransformer3DModel
324324
- local: api/models/aura_flow_transformer2d
325325
title: AuraFlowTransformer2DModel
326+
- local: api/models/transformer_bria_fibo
327+
title: BriaFiboTransformer2DModel
326328
- local: api/models/bria_transformer
327329
title: BriaTransformer2DModel
328330
- local: api/models/chroma_transformer
@@ -347,6 +349,8 @@
347349
title: HiDreamImageTransformer2DModel
348350
- local: api/models/hunyuan_transformer2d
349351
title: HunyuanDiT2DModel
352+
- local: api/models/hunyuanimage_transformer_2d
353+
title: HunyuanImageTransformer2DModel
350354
- local: api/models/hunyuan_video_transformer_3d
351355
title: HunyuanVideoTransformer3DModel
352356
- local: api/models/latte_transformer3d
@@ -411,6 +415,10 @@
411415
title: AutoencoderKLCogVideoX
412416
- local: api/models/autoencoderkl_cosmos
413417
title: AutoencoderKLCosmos
418+
- local: api/models/autoencoder_kl_hunyuanimage
419+
title: AutoencoderKLHunyuanImage
420+
- local: api/models/autoencoder_kl_hunyuanimage_refiner
421+
title: AutoencoderKLHunyuanImageRefiner
414422
- local: api/models/autoencoder_kl_hunyuan_video
415423
title: AutoencoderKLHunyuanVideo
416424
- local: api/models/autoencoderkl_ltx_video
@@ -463,6 +471,8 @@
463471
title: BLIP-Diffusion
464472
- local: api/pipelines/bria_3_2
465473
title: Bria 3.2
474+
- local: api/pipelines/bria_fibo
475+
title: Bria Fibo
466476
- local: api/pipelines/chroma
467477
title: Chroma
468478
- local: api/pipelines/cogview3
@@ -620,10 +630,14 @@
620630
title: ConsisID
621631
- local: api/pipelines/framepack
622632
title: Framepack
633+
- local: api/pipelines/hunyuanimage21
634+
title: HunyuanImage2.1
623635
- local: api/pipelines/hunyuan_video
624636
title: HunyuanVideo
625637
- local: api/pipelines/i2vgenxl
626638
title: I2VGen-XL
639+
- local: api/pipelines/kandinsky5_video
640+
title: Kandinsky 5.0 Video
627641
- local: api/pipelines/latte
628642
title: Latte
629643
- local: api/pipelines/ltx_video
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# AutoencoderKLHunyuanImage
13+
14+
The 2D variational autoencoder (VAE) model with KL loss used in [HunyuanImage2.1].
15+
16+
The model can be loaded with the following code snippet.
17+
18+
```python
19+
from diffusers import AutoencoderKLHunyuanImage
20+
21+
vae = AutoencoderKLHunyuanImage.from_pretrained("hunyuanvideo-community/HunyuanImage-2.1-Diffusers", subfolder="vae", torch_dtype=torch.bfloat16)
22+
```
23+
24+
## AutoencoderKLHunyuanImage
25+
26+
[[autodoc]] AutoencoderKLHunyuanImage
27+
- decode
28+
- all
29+
30+
## DecoderOutput
31+
32+
[[autodoc]] models.autoencoders.vae.DecoderOutput
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# AutoencoderKLHunyuanImageRefiner
13+
14+
The 3D variational autoencoder (VAE) model with KL loss used in [HunyuanImage2.1](https://github.com/Tencent-Hunyuan/HunyuanImage-2.1) for its refiner pipeline.
15+
16+
The model can be loaded with the following code snippet.
17+
18+
```python
19+
from diffusers import AutoencoderKLHunyuanImageRefiner
20+
21+
vae = AutoencoderKLHunyuanImageRefiner.from_pretrained("hunyuanvideo-community/HunyuanImage-2.1-Refiner-Diffusers", subfolder="vae", torch_dtype=torch.bfloat16)
22+
```
23+
24+
## AutoencoderKLHunyuanImageRefiner
25+
26+
[[autodoc]] AutoencoderKLHunyuanImageRefiner
27+
- decode
28+
- all
29+
30+
## DecoderOutput
31+
32+
[[autodoc]] models.autoencoders.vae.DecoderOutput

docs/source/en/api/models/chroma_transformer.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ specific language governing permissions and limitations under the License.
1212

1313
# ChromaTransformer2DModel
1414

15-
A modified flux Transformer model from [Chroma](https://huggingface.co/lodestones/Chroma)
15+
A modified flux Transformer model from [Chroma](https://huggingface.co/lodestones/Chroma1-HD)
1616

1717
## ChromaTransformer2DModel
1818

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
<!-- Copyright 2025 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License. -->
11+
12+
# HunyuanImageTransformer2DModel
13+
14+
A Diffusion Transformer model for [HunyuanImage2.1](https://github.com/Tencent-Hunyuan/HunyuanImage-2.1).
15+
16+
The model can be loaded with the following code snippet.
17+
18+
```python
19+
from diffusers import HunyuanImageTransformer2DModel
20+
21+
transformer = HunyuanImageTransformer2DModel.from_pretrained("hunyuanvideo-community/HunyuanImage-2.1-Diffusers", subfolder="transformer", torch_dtype=torch.bfloat16)
22+
```
23+
24+
## HunyuanImageTransformer2DModel
25+
26+
[[autodoc]] HunyuanImageTransformer2DModel
27+
28+
## Transformer2DModelOutput
29+
30+
[[autodoc]] models.modeling_outputs.Transformer2DModelOutput
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# BriaFiboTransformer2DModel
14+
15+
A modified flux Transformer model from [Bria](https://huggingface.co/briaai/FIBO)
16+
17+
## BriaFiboTransformer2DModel
18+
19+
[[autodoc]] BriaFiboTransformer2DModel
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
<!--Copyright 2025 The HuggingFace Team. All rights reserved.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
4+
the License. You may obtain a copy of the License at
5+
6+
http://www.apache.org/licenses/LICENSE-2.0
7+
8+
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
9+
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
10+
specific language governing permissions and limitations under the License.
11+
-->
12+
13+
# Bria Fibo
14+
15+
Text-to-image models have mastered imagination - but not control. FIBO changes that.
16+
17+
FIBO is trained on structured JSON captions up to 1,000+ words and designed to understand and control different visual parameters such as lighting, composition, color, and camera settings, enabling precise and reproducible outputs.
18+
19+
With only 8 billion parameters, FIBO provides a new level of image quality, prompt adherence and proffesional control.
20+
21+
FIBO is trained exclusively on a structured prompt and will not work with freeform text prompts.
22+
you can use the [FIBO-VLM-prompt-to-JSON](https://huggingface.co/briaai/FIBO-VLM-prompt-to-JSON) model or the [FIBO-gemini-prompt-to-JSON](https://huggingface.co/briaai/FIBO-gemini-prompt-to-JSON) to convert your freeform text prompt to a structured JSON prompt.
23+
24+
its not recommended to use freeform text prompts directly with FIBO, as it will not produce the best results.
25+
26+
you can learn more about FIBO in [Bria Fibo Hugging Face page](https://huggingface.co/briaai/FIBO).
27+
28+
29+
## Usage
30+
31+
_As the model is gated, before using it with diffusers you first need to go to the [Bria Fibo Hugging Face page](https://huggingface.co/briaai/FIBO), fill in the form and accept the gate. Once you are in, you need to login so that your system knows you’ve accepted the gate._
32+
33+
Use the command below to log in:
34+
35+
```bash
36+
hf auth login
37+
```
38+
39+
40+
## BriaPipeline
41+
42+
[[autodoc]] BriaPipeline
43+
- all
44+
- __call__
45+

docs/source/en/api/pipelines/chroma.md

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -19,20 +19,21 @@ specific language governing permissions and limitations under the License.
1919

2020
Chroma is a text to image generation model based on Flux.
2121

22-
Original model checkpoints for Chroma can be found [here](https://huggingface.co/lodestones/Chroma).
22+
Original model checkpoints for Chroma can be found here:
23+
* High-resolution finetune: [lodestones/Chroma1-HD](https://huggingface.co/lodestones/Chroma1-HD)
24+
* Base model: [lodestones/Chroma1-Base](https://huggingface.co/lodestones/Chroma1-Base)
25+
* Original repo with progress checkpoints: [lodestones/Chroma](https://huggingface.co/lodestones/Chroma) (loading this repo with `from_pretrained` will load a Diffusers-compatible version of the `unlocked-v37` checkpoint)
2326

2427
> [!TIP]
2528
> Chroma can use all the same optimizations as Flux.
2629
2730
## Inference
2831

29-
The Diffusers version of Chroma is based on the [`unlocked-v37`](https://huggingface.co/lodestones/Chroma/blob/main/chroma-unlocked-v37.safetensors) version of the original model, which is available in the [Chroma repository](https://huggingface.co/lodestones/Chroma).
30-
3132
```python
3233
import torch
3334
from diffusers import ChromaPipeline
3435

35-
pipe = ChromaPipeline.from_pretrained("lodestones/Chroma", torch_dtype=torch.bfloat16)
36+
pipe = ChromaPipeline.from_pretrained("lodestones/Chroma1-HD", torch_dtype=torch.bfloat16)
3637
pipe.enable_model_cpu_offload()
3738

3839
prompt = [
@@ -63,10 +64,10 @@ Then run the following example
6364
import torch
6465
from diffusers import ChromaTransformer2DModel, ChromaPipeline
6566

66-
model_id = "lodestones/Chroma"
67+
model_id = "lodestones/Chroma1-HD"
6768
dtype = torch.bfloat16
6869

69-
transformer = ChromaTransformer2DModel.from_single_file("https://huggingface.co/lodestones/Chroma/blob/main/chroma-unlocked-v37.safetensors", torch_dtype=dtype)
70+
transformer = ChromaTransformer2DModel.from_single_file("https://huggingface.co/lodestones/Chroma1-HD/blob/main/Chroma1-HD.safetensors", torch_dtype=dtype)
7071

7172
pipe = ChromaPipeline.from_pretrained(model_id, transformer=transformer, torch_dtype=dtype)
7273
pipe.enable_model_cpu_offload()

0 commit comments

Comments
 (0)