Skip to content

Conversation

@hadipash
Copy link
Collaborator

@hadipash hadipash commented Oct 9, 2024

Add:

  • Support for MS2.7
  • Improve performance

Tests were conducted in dynamic DVM mode, on MS daily from 09.04 with CANN 8.0 RC2. Results include training step average time only (no data loading time):

Changes Shape
(res x frames x batch)
Time (s) Change (s) Comment
Original 720p x 51 x 2 30.409
144p x 204 x 10 19.934
Switch to repeat_interleave_ext_v2 720p x 51 x 2 28.913 -1.496 (-4.9%)
144p x 204 x 10 19.872 -0.062 (-0.3%)
Remove SiLU & GELU FP32 upcast 720p x 51 x 2 30.346 -0.062 (-0.2%) No performance improvement,
144p x 204 x 10 20.506 +0.572 (+2.9%) will consult with the MS team.
Convert parameters to BF16 720p x 51 x 2 28.957 -1.452 (-4.8%)
144p x 204 x 10 18.747 -1.187 (-3.9%)
Remove redundant ops.transpose in VAE 720p x 51 x 2 30.448 +0.040 (+0.1%) No changes due to the kernel fusion.
144p x 204 x 10 20.103 +0.168 (+0.8%) Beneficial in KBK & PyNative modes.
Final improvement 720p x 51 x 2 27.896 -2.512 (-8.3%)
144p x 204 x 10 18.804 -1.130 (-5.7%)

Copy link
Collaborator

@zhtmike zhtmike left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems no code change for Convert parameters to BF16 ?

@hadipash
Copy link
Collaborator Author

seems no code change for Convert parameters to BF16 ?

This refers to the network parameters that are explicitly defined with nn.Parameter(), such as self.scale_shift_table. For some reason, any calculations performed on self.scale_shift_table are upcast to the parameter type (i.e. fp32) and the new type is propagated in the network, even with AMP enabled.

# Conflicts:
#	examples/opensora_hpcai/opensora/models/layers/blocks.py
#	examples/opensora_hpcai/opensora/utils/model_utils.py
hadipash added 3 commits March 7, 2025 11:28
# Conflicts:
#	examples/opensora_hpcai/scripts/inference.py
# Conflicts:
#	examples/opensora_hpcai/opensora/utils/model_utils.py
@hadipash hadipash changed the title [OpenSora-hpcai] OSv1.2 performance optimization [OpenSora-hpcai] add support for MS 2.7 and OSv1.2 performance optimization Oct 17, 2025
@vigo999 vigo999 added this pull request to the merge queue Oct 18, 2025
Merged via the queue into mindspore-lab:master with commit 48a8dea Oct 18, 2025
3 checks passed
@hadipash hadipash deleted the perf_op branch October 31, 2025 06:47
vigo999 added a commit that referenced this pull request Nov 2, 2025
- Added PR links to model components where specific PRs exist (#1288, #1148)
- Added PR links to examples models that have individual PRs (#1378, #1233, #1363, #1243, #687, #1362, #1227, #1346, #1200, #1369)
- Noted that some components were added as part of broader pipeline implementations
- Improved traceability for specific model additions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants