[Modular] More Updates for Custom Code Loading #11969

DN6 · 2025-07-21T17:01:05Z

What does this PR do?

In order to support custom code for any block type, we have to invoke downloading/loading of custom code from the ModularPipelineBlocks object. This means some of the properties/methods defined in PipelineBlock, SequentialPipelineBlock etc have to be moved into the ModularPipelineBlocks class.

Additionally, included a potential change to consolidate how the block state is created from inputs and intermediate inputs. The change is necessary to support the case where a pipeline input might be created in an intermediate step e.g. Using a segmentation model to create an inpainting mask. With the existing approach, simply setting the output of the mask creation step to the desired input value mask_image wouldn't allow downstream steps to access it in the block state, because get_block_state runs over required inputs first, and errors out because it's missing. Propose changing this to checking for required values in both input and intermediates before erroring out. There could be edge cases here that I might be missing, but it seems safe to consolidate in this way.

Code to test

import torch
from diffusers.modular_pipelines import ModularPipelineBlocks, SequentialPipelineBlocks
from diffusers.modular_pipelines.stable_diffusion_xl import INPAINT_BLOCKS
from diffusers.utils import load_image

# fetch the Florence2 image annotator block that will create our mask
image_annotator_block = ModularPipelineBlocks.from_pretrained(
    "diffusers-internal-dev/florence2-image-annotator",
    trust_remote_code=True,
)

my_blocks = INPAINT_BLOCKS.copy()
# insert the annotation block before the image encoding step
my_blocks.insert("image_annotator", image_annotator_block, 1)

# Create our initial set of inpainting blocks
blocks = SequentialPipelineBlocks.from_blocks_dict(my_blocks)

repo_id = "YiYiXu/modular-loader-t2i-0704"
pipe = blocks.init_pipeline(repo_id)
pipe.load_default_components(torch_dtype=torch.float16, device_map="cuda", trust_remote_code=True)

image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg?download=true")
image = image.resize((1024, 1024))

batch_size = 1
image = [image] * batch_size

prompt = ["A red car"] * batch_size
annotation_prompt = ["<REFERRING_EXPRESSION_SEGMENTATION>the car"] * batch_size

output = pipe(
    prompt=prompt,
    image=image,
    annotation_task_prompt=annotation_prompt,
    num_inference_steps=35,
    guidance_scale=7.5,
    strength=0.9,
    output_type="pil",
)
output.intermediates["mask_image"][0].save("annotated_mask_image.png")
output.intermediates["images"][0].save("annotated.png")

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

HuggingFaceDocBuilderDev · 2025-07-21T17:08:21Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

yiyixuxu · 2025-07-21T18:18:17Z

Additionally, included a potential change to consolidate how the block state is created from inputs and intermediate inputs. The change is necessary to support the case where a pipeline input might be created in an intermediate step

I see, do you think maybe we should drop the concept difference between "inputs" and "intermediate inputs" algether? all inputs could be just "intermediates", and they can all be modified. https://huggingface.co/docs/diffusers/main/en/modular_diffusers/modular_diffusers_states

Taking the example of mask_image, it seems like we really just need to declare it as an intermediate_inputs in all downstream blocks: if it is an intermediate inputs, the donwsream blocks will first look to see if a previous block provides it, only if not in intermediate state, it will look into the input states to use the value provided by users - exactly what we need here.

If we do this for mask_image, we probably need to do it for image too, and same goes for all other user inputs.

We need to be a little bit more mindful in modifying variables, and recommend to always use a new variable name, or not add to block_state, unless it's intended to replace

e.g. if a downstream block needs to use a block like this would mean that it would not be able to access the raw image before the process

def __call__(...):
    ...
    block_state.image = process(block_state.image)
    block_state.image_latent = prepare_latents(block_state.image)
   ...

but, this might be fine (depends on if image is a mutable and if process function modifies it in place)

def __call__(...):
    ...
    image = process(block_state.image)
    block_state.image_latent = prepare_latents(block_state.image)
   ...

On the other hand, the system would be more flexible, and also it simplifies a bit conceptually (I found it not easy to explain the difference between these two states)

Let me know what you think!

DN6 · 2025-07-22T06:51:09Z

I see, do you think maybe we should drop the concept difference between "inputs" and "intermediate inputs" algether? all inputs could be just "intermediates", and they can all be modified. https://huggingface.co/docs/diffusers/main/en/modular_diffusers/modular_diffusers_states

IMO this makes sense since we more or less put them into the same group when fetching the block state and they are accessed in the block in the same way. I can't think of any edge cases where this might lead to issues.

Taking the example of mask_image, it seems like we really just need to declare it as an intermediate_inputs in all downstream blocks: if it is an intermediate inputs, the donwsream blocks will first look to see if a previous block provides it, only if not in intermediate state, it will look into the input states to use the value provided by users - exactly what we need here.

We could do this, but suppose you want to insert a custom block that manipulates an input value into a set of existing blocks. You would have to update all subsequent blocks to point to the intermediate input.

def __call__(...):
  ...
  image = process(block_state.image)
  block_state.image_latent = prepare_latents(block_state.image)
 ...

I think this works well as best practice if we want to leave the input unchanged. IMO the less restrictive we are the better since there isn't a very strong reason to keep the input types separated?

DN6 · 2025-07-22T14:02:37Z

src/diffusers/modular_pipelines/modular_pipeline.py

@@ -322,7 +322,7 @@ class ModularPipelineBlocks(ConfigMixin, PushToHubMixin):
    </Tip>
    """

-    config_name = "config.json"
+    config_name = "modular_config.json"


For consistency with modular_model_index.json. Also could be cases where a repo contains model weights/config file and a modular pipeline block to load the model. We can avoid conflicts with the configs this way.

yiyixuxu · 2025-07-22T16:00:45Z

IMO this makes sense since we more or less put them into the same group when fetching the block state and they are accessed in the block in the same way. I can't think of any edge cases where this might lead to issues.

Sounds good! let's do this then! I think it will simplify the code a lot. Basically now we'll just have inputs and outputs
I'm happy to take on this one since I'm more familiar with code, but let me know if you want to work on this :)

yiyixuxu · 2025-07-22T21:49:47Z

Another thing is: should we remove the concept of single PipelineBlock? it seems like you moved all the methods unique to PipelineBlock to the base ModularPipelineBlocks and we just don't need it anymore :)
(I haven't tested it out though, but I think it would be great if it's the case)

DN6 · 2025-07-23T11:25:40Z

@yiyixuxu
I can make the following updates to this PR

Consolidate intermediate inputs/inputs and outputs
Remove PipelineBlock

Then you can review and refactor as you see fit?

yiyixuxu · 2025-07-23T16:17:54Z

@DN6 sounds good!

update

60d1b81

DN6 requested a review from yiyixuxu July 21, 2025 17:01

update

4423097

DN6 commented Jul 22, 2025

View reviewed changes

Merge branch 'main' into custom-code-updates

201da97

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Modular] More Updates for Custom Code Loading #11969

[Modular] More Updates for Custom Code Loading #11969

Uh oh!

DN6 commented Jul 21, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Jul 21, 2025

Uh oh!

yiyixuxu commented Jul 21, 2025 •

edited

Loading

Uh oh!

DN6 commented Jul 22, 2025

Uh oh!

DN6 Jul 22, 2025 •

edited

Loading

Uh oh!

yiyixuxu commented Jul 22, 2025 •

edited

Loading

Uh oh!

yiyixuxu commented Jul 22, 2025

Uh oh!

DN6 commented Jul 23, 2025

Uh oh!

yiyixuxu commented Jul 23, 2025

Uh oh!

Uh oh!

[Modular] More Updates for Custom Code Loading #11969

Are you sure you want to change the base?

[Modular] More Updates for Custom Code Loading #11969

Uh oh!

Conversation

DN6 commented Jul 21, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

HuggingFaceDocBuilderDev commented Jul 21, 2025

Uh oh!

yiyixuxu commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

DN6 commented Jul 22, 2025

Uh oh!

DN6 Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yiyixuxu commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yiyixuxu commented Jul 22, 2025

Uh oh!

DN6 commented Jul 23, 2025

Uh oh!

yiyixuxu commented Jul 23, 2025

Uh oh!

Uh oh!

yiyixuxu commented Jul 21, 2025 •

edited

Loading

DN6 Jul 22, 2025 •

edited

Loading

yiyixuxu commented Jul 22, 2025 •

edited

Loading