Skip to content

Conversation

@Cui-yshoho
Copy link
Contributor

@Cui-yshoho Cui-yshoho commented Oct 24, 2025

What does this PR do?

Add

  • kandinsky5 with diffusers master.
    • mindone.diffusers.Kandinsky5Transformer3DModel
    • mindone.diffusers.Kandinsky5T2VPipeline

Usage

import mindspore as ms
from mindone.diffusers import Kandinsky5T2VPipeline
from mindone.diffusers.utils import export_to_video

# Available models:
# ai-forever/Kandinsky-5.0-T2V-Lite-sft-5s-Diffusers
# ai-forever/Kandinsky-5.0-T2V-Lite-nocfg-5s-Diffusers
# ai-forever/Kandinsky-5.0-T2V-Lite-distilled16steps-5s-Diffusers
# ai-forever/Kandinsky-5.0-T2V-Lite-pretrain-5s-Diffusers

model_id = "ai-forever/Kandinsky-5.0-T2V-Lite-sft-5s-Diffusers"
pipe = Kandinsky5T2VPipeline.from_pretrained(model_id, mindspore_dtype=ms.bfloat16)

prompt = "A cat and a dog baking a cake together in a kitchen."
negative_prompt = "Static, 2D cartoon, cartoon, 2d animation, paintings, images, worst quality, low quality, ugly, deformed, walking backwards"

output = pipe(
    prompt=prompt,
    negative_prompt=negative_prompt,
    height=512,
    width=768,
    num_frames=121,
    num_inference_steps=50,
    guidance_scale=5.0,
)[0][0]

export_to_video(output, "output.mp4", fps=24, quality=9)

Performance

Experiments are tested on Ascend Atlas 800T A2 machines with mindspore 2.7.0.

pipeline mode speed
Kandinsky5T2VPipeline pynative 14.58 s/it
Kandinsky5T2VPipeline jit 14.64 s/it

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline?
  • Did you make sure to update the documentation with your changes? E.g. record bug fixes or new features in What's New. Here are the
    documentation guidelines
  • Did you build and run the code without any errors?
  • Did you report the running environment (NPU type/MS version) and performance in the doc? (better record it for data loading, model inference, or training tasks)
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@xxx

@Cui-yshoho Cui-yshoho requested a review from vigo999 as a code owner October 24, 2025 06:43
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Cui-yshoho, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the mindone.diffusers library by integrating the Kandinsky 5.0 Text-to-Video (T2V) generation capabilities. It introduces a new 3D Diffusion Transformer model, a dedicated pipeline for T2V generation, and comprehensive support for LoRA fine-tuning. These additions empower users to generate high-quality videos from text prompts, leveraging advanced text and visual encoding mechanisms, while also enhancing the library's MindSpore compatibility for core neural network operations.

Highlights

  • Kandinsky 5.0 Text-to-Video Pipeline: Introduces the Kandinsky5T2VPipeline for text-to-video generation, enabling users to create videos from text prompts within the diffusers library.
  • Kandinsky5 Transformer3D Model Implementation: Adds the core Kandinsky5Transformer3DModel, a 3D Diffusion Transformer designed for video-like data, including its various sub-components like embeddings, attention mechanisms, and modulation layers.
  • LoRA Support for Kandinsky: Implements KandinskyLoraLoaderMixin to provide Low-Rank Adaptation (LoRA) capabilities for the Kandinsky5Transformer3DModel, allowing for efficient fine-tuning and adaptation.
  • MindSpore Compatibility Enhancements: Introduces a new RMSNorm class and dtype_to_eps utility, improving MindSpore version compatibility and supporting the new model's specific normalization requirements.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the Kandinsky 5 model, including the Kandinsky5Transformer3DModel and Kandinsky5T2VPipeline. The changes are extensive, adding new model and pipeline files, along with necessary updates to initializers, loaders, and documentation. A custom RMSNorm layer is also added for compatibility. The implementation appears to be a solid port, following the existing structure of the diffusers library. My review focuses on ensuring robustness and correctness. I've identified one area for improvement regarding hardcoded data types that could affect the model's flexibility.

Comment on lines +267 to +268
x_ = x.reshape(*x.shape[:-1], -1, 1, 2).to(ms.float32)
x_out = (rope * x_).sum(dim=-1)
return x_out.reshape(*x.shape).to(ms.bfloat16)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The apply_rotary function hardcodes the output dtype to ms.bfloat16 before returning. This can lead to unnecessary precision loss or errors if the model is used with other dtypes like float16 or float32. For instance, if the input x is float32, it would be downcasted to bfloat16 and then upcasted back to float32 at the call site, causing a loss of precision.

To make this function more robust and dtype-agnostic, it's better to remove the hardcoded cast. The intermediate calculations are already performed in float32, and the calling code already handles casting the result back to the original dtype of the input tensor.

Suggested change
x_ = x.reshape(*x.shape[:-1], -1, 1, 2).to(ms.float32)
x_out = (rope * x_).sum(dim=-1)
return x_out.reshape(*x.shape).to(ms.bfloat16)
x_ = x.reshape(*x.shape[:-1], -1, 1, 2).to(ms.float32)
x_out = (rope * x_).sum(dim=-1)
return x_out.reshape(*x.shape)

@Cui-yshoho Cui-yshoho added the new model add new model to mindone label Oct 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

new model add new model to mindone

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants