Skip to content

Use fallthrough in optimizeInstructions to further optimize (unsigned)x >= 0 ==> i32(0) #7557

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

xuruiyang2002
Copy link
Contributor

Currently, wasm-opt cannot optimize the code as expected: (unsigned)x >= 0 ==> i32(0).

   (i32.ge_u
    (i32.load
     (i32.const 0)
    )
    (block (result i32)
     (i32.store
      (i32.const 0)
      (i32.const 0)
     )
     (i32.const 0)
    )
   )

It should have been optimized after merge-blocks, as the store could have been hoisted. However, the load and store cannot be reordered, preventing wasm-opt from doing so.

This can be addressed in optimizeInstructions by using fallthrough to hoist the constant zero, enabling further optimizations.

Fixes: #7556

@xuruiyang2002
Copy link
Contributor Author

The issue involved (unsigned(x) >= 0 => i32(1)) in this PR is a mirror of the previously fixed issue #7455 (for unsigned(x) < 0 => i32(0)), so I think it makes sense to fix this for a better optimization.

Could you please review if you have a moment? Thank you. @kripken

Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks right here, but see the comment in the test - it is good to annotate tests to make it easy to follow things.

@kripken
Copy link
Member

kripken commented Jun 30, 2025

Though, separately, in discussion with @tlively and @osa1 the idea came up to perhaps handle this family of problems in a general way. We do have the Flatten pass for this, which has 2 current downsides:

  1. Flatten does not support recent features like exceptions and GC.
  2. Flatten adds a lot of overhead since it forces many things to locals that do not need to be, at least not for the reason this PR is fixing. That is, this PR has
   (i32.ge_u
    (i32.load
     (i32.const 0)
    )
    (block (result i32)
     (i32.store
      (i32.const 0)
      (i32.const 0)
     )
     (i32.const 0)
    )
   )

merge-blocks can't move the store out, since it can't move it past the load due to effects. That forces us to look at the fallthrough. But, perhaps a "flatten-lite" could do something more careful, and apply ChildLocalizer. ChildLocalizer will only move children to locals when they actually interact, so it would help in this case.

That is, I am suggesting that we add a new pass which basically runs ChildLocalizer on all expressions (with more than one child). That would "partially flatten" out the code, just enough for OptimizeInstructions to handle the case in this PR, and related cases. And we would do this ChildLocalizer work fairly early in the pipeline so that normal optimizations remove the extra locals later, after we try to benefit from them.

Thoughts?

@xuruiyang2002
Copy link
Contributor Author

Does it cost too much to add a new pass which basically runs ChildLocalizer on all expressions?

For my experience on this family of bugs, wasm-opt can handle most cases with effects in expression well. Now what I think the most problematic is the optimization for the if's condition (optimizeBoolean). Because:
(1) It indeed cannot deal with that family well
(2) RemoveUnusedBrs can transform block-br structure to if structure. (See issue #7661)

So maybe we can just request for the help of ChildLocalizer in some key points? Like in the optimization for the if's condition.

@kripken
Copy link
Member

kripken commented Jul 1, 2025

Does it cost too much to add a new pass which basically runs ChildLocalizer on all expressions?

Perhaps, we would need to measure it. But adding another function pass that works in linear time is generally low in cost (all function passes run on a function in sequence, keeping the function in cache, which makes it very fast, unlike global passes that scan the entire wasm - adding a single global pass is usually costly).

(Also, I wonder if adding such a pass could allow us to remove ssa-nomerge that we run in -O3 atm, as there is some overlap there... but that's just a vague thought.)

So maybe we can just request for the help of ChildLocalizer in some key points?

I think there may be many such key points, though. This PR is an example, and many other similar rules could use this, in order to not need the manual fallthrough fix.

@kripken
Copy link
Member

kripken commented Jul 2, 2025

I'll find time next week (holiday weekend starts soon) to experiment with that ChildLocalizer pass. If you're interested to do it though @xuruiyang2002 , feel free to (maybe just comment here if so, so we don't do duplicate work).

@xuruiyang2002
Copy link
Contributor Author

Okay, looking forward to your experiments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cannot optimize complex ge_u 0 out
2 participants