fix(qwenimage): Add padding for context parallelism #12595

Ratish1 · 2025-11-05T14:34:40Z

What does this PR do?

This PR fixes a bug that causes the QwenImage model to crash when using context parallelism with a prompt whose sequence length is not divisible by the world size.

The fix is implemented within the QwenImageTransformer2DModel and consists of three parts:

Input Padding: The text prompt embeddings (encoder_hidden_states) and their attention mask are padded at the
start of the forward method to ensure their length is divisible by the world size.
RoPE Correction: The model is updated to use the new, padded sequence length to generate the rotary positional
embeddings (RoPE), preventing a tensor shape mismatch that was causing a RuntimeError.
Attention Masking: The QwenDoubleStreamAttnProcessor2_0 is corrected to build and use a proper additive attention mask. This ensures the new padded tokens are correctly ignored by the attention mechanism, preserving the numerical output of the model.

A new unit test is also added to simulate a distributed environment. It verifies that the padding logic prevents the crash while ensuring the output is numerically equivalent to the baseline, non-padded run.

Fixes #12568

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@sayakpaul @yiyixuxu @DN6

yiyixuxu · 2025-11-05T19:47:57Z

thanks for the PR! however, we will not want any of these logic go into qwen transformer
would you be interested to how to support this case( not just qwen) from the context parallel hooks
https://github.com/huggingface/diffusers/blob/main/src/diffusers/hooks/context_parallel.py#L204

Ratish1 · 2025-11-05T20:27:46Z

thanks for the PR! however, we will not want any of these logic go into qwen transformer would you be interested to how to support this case( not just qwen) from the context parallel hooks https://github.com/huggingface/diffusers/blob/main/src/diffusers/hooks/context_parallel.py#L204

Hi @yiyixuxu, yes I would be interested to support this change. So, should I revert my changes from the Qwen transformer and implement the padding logic inside the _prepare_cp_input function in context_parallel.py as you suggested?.

I have one follow up question regarding the model-specific consequences of this padding. The Qwen transformer calculates Rotary Positional Embeddings (RoPE) based on the original sequence length. When the hook pads the input tensor, the model still
needs to be aware of the new, padded length to avoid a shape mismatch inside the RoPE calculation.

Previously, I fixed this by recalculating the sequence length inside the transformer's forward method based on the padded tensor's shape. With the padding logic now in the hook, what is the preferred way to handle this?

Is it acceptable to keep that small, model-specific part of the logic (recalculating the sequence length for RoPE) inside the Qwen transformer, or is there a more general way to communicate the new padded length from the hook back to the model that I should use instead?. Thanks for your help.

fix(qwenimage): Correct context parallelism padding

baf42db

Ratish1 changed the title ~~fix(qwenimage): Correct context parallelism padding~~ fix(qwenimage): Add padding for context parallelism Nov 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(qwenimage): Add padding for context parallelism #12595

fix(qwenimage): Add padding for context parallelism #12595

Ratish1 commented Nov 5, 2025

Uh oh!

yiyixuxu commented Nov 5, 2025

Uh oh!

Ratish1 commented Nov 5, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix(qwenimage): Add padding for context parallelism #12595

Are you sure you want to change the base?

fix(qwenimage): Add padding for context parallelism #12595

Conversation

Ratish1 commented Nov 5, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

yiyixuxu commented Nov 5, 2025

Uh oh!

Ratish1 commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Ratish1 commented Nov 5, 2025 •

edited

Loading