feat(diffusers): add kandinsky5 to diffusers #1388

Cui-yshoho · 2025-10-24T06:42:59Z

What does this PR do?

Add

kandinsky5 with diffusers master.
- mindone.diffusers.Kandinsky5Transformer3DModel
- mindone.diffusers.Kandinsky5T2VPipeline

Usage

import mindspore as ms
from mindone.diffusers import Kandinsky5T2VPipeline
from mindone.diffusers.utils import export_to_video

# Available models:
# ai-forever/Kandinsky-5.0-T2V-Lite-sft-5s-Diffusers
# ai-forever/Kandinsky-5.0-T2V-Lite-nocfg-5s-Diffusers
# ai-forever/Kandinsky-5.0-T2V-Lite-distilled16steps-5s-Diffusers
# ai-forever/Kandinsky-5.0-T2V-Lite-pretrain-5s-Diffusers

model_id = "ai-forever/Kandinsky-5.0-T2V-Lite-sft-5s-Diffusers"
pipe = Kandinsky5T2VPipeline.from_pretrained(model_id, mindspore_dtype=ms.bfloat16)

prompt = "A cat and a dog baking a cake together in a kitchen."
negative_prompt = "Static, 2D cartoon, cartoon, 2d animation, paintings, images, worst quality, low quality, ugly, deformed, walking backwards"

output = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    height=512,
    width=768,
    num_frames=121,
    num_inference_steps=50,
    guidance_scale=5.0,
)[0][0]

export_to_video(output, "output.mp4", fps=24, quality=9)

Performance

Experiments are tested on Ascend Atlas 800T A2 machines with mindspore 2.7.0.

pipeline	mode	speed
Kandinsky5T2VPipeline	pynative	14.58 s/it
Kandinsky5T2VPipeline	jit	14.64 s/it

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you make sure to update the documentation with your changes? E.g. record bug fixes or new features in What's New. Here are the
documentation guidelines
Did you build and run the code without any errors?
Did you report the running environment (NPU type/MS version) and performance in the doc? (better record it for data loading, model inference, or training tasks)
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@xxx

gemini-code-assist · 2025-10-24T06:43:28Z

Summary of Changes

Hello @Cui-yshoho, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the mindone.diffusers library by integrating the Kandinsky 5.0 Text-to-Video (T2V) generation capabilities. It introduces a new 3D Diffusion Transformer model, a dedicated pipeline for T2V generation, and comprehensive support for LoRA fine-tuning. These additions empower users to generate high-quality videos from text prompts, leveraging advanced text and visual encoding mechanisms, while also enhancing the library's MindSpore compatibility for core neural network operations.

Highlights

Kandinsky 5.0 Text-to-Video Pipeline: Introduces the Kandinsky5T2VPipeline for text-to-video generation, enabling users to create videos from text prompts within the diffusers library.
Kandinsky5 Transformer3D Model Implementation: Adds the core Kandinsky5Transformer3DModel, a 3D Diffusion Transformer designed for video-like data, including its various sub-components like embeddings, attention mechanisms, and modulation layers.
LoRA Support for Kandinsky: Implements KandinskyLoraLoaderMixin to provide Low-Rank Adaptation (LoRA) capabilities for the Kandinsky5Transformer3DModel, allowing for efficient fine-tuning and adaptation.
MindSpore Compatibility Enhancements: Introduces a new RMSNorm class and dtype_to_eps utility, improving MindSpore version compatibility and supporting the new model's specific normalization requirements.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces the Kandinsky 5 model, including the Kandinsky5Transformer3DModel and Kandinsky5T2VPipeline. The changes are extensive, adding new model and pipeline files, along with necessary updates to initializers, loaders, and documentation. A custom RMSNorm layer is also added for compatibility. The implementation appears to be a solid port, following the existing structure of the diffusers library. My review focuses on ensuring robustness and correctness. I've identified one area for improvement regarding hardcoded data types that could affect the model's flexibility.

gemini-code-assist · 2025-10-24T06:45:22Z

mindone/diffusers/models/transformers/transformer_kandinsky.py

+            x_ = x.reshape(*x.shape[:-1], -1, 1, 2).to(ms.float32)
+            x_out = (rope * x_).sum(dim=-1)
+            return x_out.reshape(*x.shape).to(ms.bfloat16)


The apply_rotary function hardcodes the output dtype to ms.bfloat16 before returning. This can lead to unnecessary precision loss or errors if the model is used with other dtypes like float16 or float32. For instance, if the input x is float32, it would be downcasted to bfloat16 and then upcasted back to float32 at the call site, causing a loss of precision.

To make this function more robust and dtype-agnostic, it's better to remove the hardcoded cast. The intermediate calculations are already performed in float32, and the calling code already handles casting the result back to the original dtype of the input tensor.

Suggested change

x_ = x.reshape(*x.shape[:-1], -1, 1, 2).to(ms.float32)

x_out = (rope * x_).sum(dim=-1)

return x_out.reshape(*x.shape).to(ms.bfloat16)

x_ = x.reshape(*x.shape[:-1], -1, 1, 2).to(ms.float32)

x_out = (rope * x_).sum(dim=-1)

return x_out.reshape(*x.shape)

Cui-yshoho requested a review from vigo999 as a code owner October 24, 2025 06:43

gemini-code-assist bot reviewed Oct 24, 2025

View reviewed changes

zhanghuiyao approved these changes Oct 24, 2025

View reviewed changes

add kandinsky5

0c11798

Cui-yshoho force-pushed the 1021_kandinsky5 branch from f7efb4c to 0c11798 Compare October 24, 2025 07:05

CaitinZhao approved these changes Oct 24, 2025

View reviewed changes

SamitHuang approved these changes Oct 24, 2025

View reviewed changes

Cui-yshoho added the new model add new model to mindone label Oct 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat(diffusers): add kandinsky5 to diffusers #1388

feat(diffusers): add kandinsky5 to diffusers #1388

Uh oh!

Cui-yshoho commented Oct 24, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Oct 24, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

feat(diffusers): add kandinsky5 to diffusers #1388

Are you sure you want to change the base?

feat(diffusers): add kandinsky5 to diffusers #1388

Uh oh!

Conversation

Cui-yshoho commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Add

Usage

Performance

Before submitting

Who can review?

Uh oh!

gemini-code-assist bot commented Oct 24, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Cui-yshoho commented Oct 24, 2025 •

edited

Loading