Use fallthrough in optimizeInstructions to further optimize (unsigned)x >= 0 ==> i32(0) #7557

xuruiyang2002 · 2025-04-26T13:37:47Z

Currently, wasm-opt cannot optimize the code as expected: (unsigned)x >= 0 ==> i32(0).

   (i32.ge_u
    (i32.load
     (i32.const 0)
    )
    (block (result i32)
     (i32.store
      (i32.const 0)
      (i32.const 0)
     )
     (i32.const 0)
    )
   )

It should have been optimized after merge-blocks, as the store could have been hoisted. However, the load and store cannot be reordered, preventing wasm-opt from doing so.

This can be addressed in optimizeInstructions by using fallthrough to hoist the constant zero, enabling further optimizations.

Fixes: #7556

xuruiyang2002 · 2025-05-27T09:09:41Z

The issue involved (unsigned(x) >= 0 => i32(1)) in this PR is a mirror of the previously fixed issue #7455 (for unsigned(x) < 0 => i32(0)), so I think it makes sense to fix this for a better optimization.

Could you please review if you have a moment? Thank you. @kripken

kripken

Code looks right here, but see the comment in the test - it is good to annotate tests to make it easy to follow things.

test/lit/passes/optimize-instructions-mvp.wast

kripken · 2025-06-30T17:03:51Z

Though, separately, in discussion with @tlively and @osa1 the idea came up to perhaps handle this family of problems in a general way. We do have the Flatten pass for this, which has 2 current downsides:

Flatten does not support recent features like exceptions and GC.
Flatten adds a lot of overhead since it forces many things to locals that do not need to be, at least not for the reason this PR is fixing. That is, this PR has

   (i32.ge_u
    (i32.load
     (i32.const 0)
    )
    (block (result i32)
     (i32.store
      (i32.const 0)
      (i32.const 0)
     )
     (i32.const 0)
    )
   )

merge-blocks can't move the store out, since it can't move it past the load due to effects. That forces us to look at the fallthrough. But, perhaps a "flatten-lite" could do something more careful, and apply ChildLocalizer. ChildLocalizer will only move children to locals when they actually interact, so it would help in this case.

That is, I am suggesting that we add a new pass which basically runs ChildLocalizer on all expressions (with more than one child). That would "partially flatten" out the code, just enough for OptimizeInstructions to handle the case in this PR, and related cases. And we would do this ChildLocalizer work fairly early in the pipeline so that normal optimizations remove the extra locals later, after we try to benefit from them.

Thoughts?

xuruiyang2002 · 2025-07-01T06:04:42Z

Does it cost too much to add a new pass which basically runs ChildLocalizer on all expressions?

For my experience on this family of bugs, wasm-opt can handle most cases with effects in expression well. Now what I think the most problematic is the optimization for the if's condition (optimizeBoolean). Because:
(1) It indeed cannot deal with that family well
(2) RemoveUnusedBrs can transform block-br structure to if structure. (See issue #7661)

So maybe we can just request for the help of ChildLocalizer in some key points? Like in the optimization for the if's condition.

kripken · 2025-07-01T17:27:18Z

Does it cost too much to add a new pass which basically runs ChildLocalizer on all expressions?

Perhaps, we would need to measure it. But adding another function pass that works in linear time is generally low in cost (all function passes run on a function in sequence, keeping the function in cache, which makes it very fast, unlike global passes that scan the entire wasm - adding a single global pass is usually costly).

(Also, I wonder if adding such a pass could allow us to remove ssa-nomerge that we run in -O3 atm, as there is some overlap there... but that's just a vague thought.)

So maybe we can just request for the help of ChildLocalizer in some key points?

I think there may be many such key points, though. This PR is an example, and many other similar rules could use this, in order to not need the manual fallthrough fix.

kripken · 2025-07-02T18:03:06Z

I'll find time next week (holiday weekend starts soon) to experiment with that ChildLocalizer pass. If you're interested to do it though @xuruiyang2002 , feel free to (maybe just comment here if so, so we don't do duplicate work).

xuruiyang2002 · 2025-07-03T14:33:07Z

Okay, looking forward to your experiments.

xuruiyang2002 added 2 commits April 26, 2025 21:34

opt for WebAssembly#7556

e406be6

test

1cacdf3

kripken reviewed Jun 30, 2025

View reviewed changes

test/lit/passes/optimize-instructions-mvp.wast Show resolved Hide resolved

Add annotatation

3a3b10e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use fallthrough in optimizeInstructions to further optimize (unsigned)x >= 0 ==> i32(0) #7557

Use fallthrough in optimizeInstructions to further optimize (unsigned)x >= 0 ==> i32(0) #7557

Uh oh!

xuruiyang2002 commented Apr 26, 2025

Uh oh!

xuruiyang2002 commented May 27, 2025

Uh oh!

kripken left a comment

Uh oh!

Uh oh!

kripken commented Jun 30, 2025

Uh oh!

xuruiyang2002 commented Jul 1, 2025

Uh oh!

kripken commented Jul 1, 2025

Uh oh!

kripken commented Jul 2, 2025

Uh oh!

xuruiyang2002 commented Jul 3, 2025

Uh oh!

Uh oh!

Use fallthrough in optimizeInstructions to further optimize (unsigned)x >= 0 ==> i32(0) #7557

Are you sure you want to change the base?

Use fallthrough in optimizeInstructions to further optimize (unsigned)x >= 0 ==> i32(0) #7557

Uh oh!

Conversation

xuruiyang2002 commented Apr 26, 2025

Uh oh!

xuruiyang2002 commented May 27, 2025

Uh oh!

kripken left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kripken commented Jun 30, 2025

Uh oh!

xuruiyang2002 commented Jul 1, 2025

Uh oh!

kripken commented Jul 1, 2025

Uh oh!

kripken commented Jul 2, 2025

Uh oh!

xuruiyang2002 commented Jul 3, 2025

Uh oh!

Uh oh!