Skip to content

Bitcode and textual IR have different BB predecessor order/LoopVectorizer output is sensitive to predecessor order #147038

Open
@lukel97

Description

@lukel97

Given this reduced bitcode: reduced.bc.zip

opt redcued.bc -S -o - shows that it has a block with predecessors in a specific order:

particle_skip.exit:                               ; preds = %if.then35.i, %if.then35.i, %if.then35.i, %2
  %maxloop.15 = phi i32 [ %maxloop.14, %2 ], [ %maxloop.08, %if.then35.i ], [ %maxloop.08, %if.then35.i ], [ %maxloop.08, %if.then35.i ]
  %inc102 = add nuw nsw i32 %p.09, 1
  %exitcond.not = icmp eq i32 %inc102, %0
  br i1 %exitcond.not, label %for.end103.loopexit, label %if.then35.i, !llvm.loop !0

However if you it convert it to textual IR with llvm-dis reduced.bc and then inspect it with opt reduced.ll -S -o -, the computed predecessors are in a different order:

particle_skip.exit:                               ; preds = %2, %if.then35.i, %if.then35.i, %if.then35.i
  %maxloop.15 = phi i32 [ %maxloop.14, %2 ], [ %maxloop.08, %if.then35.i ], [ %maxloop.08, %if.then35.i ], [ %maxloop.08, %if.then35.i ]
  %inc102 = add nuw nsw i32 %p.09, 1
  %exitcond.not = icmp eq i32 %inc102, %0
  br i1 %exitcond.not, label %for.end103.loopexit, label %if.then35.i, !llvm.loop !0

Also running the bitcode through opt reduced.bc -o reduced.bc.2 causes the predecessors to be "fixed" into the same order as above.

But this difference in predecessor order is enough for the loop vectorizer to emit different code if run with opt -disable-output -p loop-vectorize -force-tail-folding-style=data-with-evl -prefer-predicate-over-epilogue=predicate-dont-vectorize: We end up with extra recipes due to the way the VPBlendPHIRecipe is constructed based on the predecessors:

    WIDEN ir<%maxloop.14> = add ir<%maxloop.08>, ir<1>
    BLEND ir<%maxloop.15> = ir<%maxloop.14> ir<%maxloop.08>/vp<%15> ir<%maxloop.08>/vp<%15> ir<%maxloop.08>/vp<%15>
    WIDEN-INTRINSIC vp<%16> = call llvm.vp.merge(ir<true>, ir<%maxloop.15>, ir<%maxloop.08>, vp<%12>)

vs

    EMIT vp<%16> = not vp<%15>
    EMIT vp<%17> = logical-and vp<%14>, vp<%16>
    WIDEN ir<%maxloop.14> = add ir<%maxloop.08>, ir<1>
    BLEND ir<%maxloop.15> = ir<%maxloop.08> ir<%maxloop.08>/vp<%15> ir<%maxloop.08>/vp<%15> ir<%maxloop.14>/vp<%17>
    WIDEN-INTRINSIC vp<%18> = call llvm.vp.merge(ir<true>, ir<%maxloop.15>, ir<%maxloop.08>, vp<%12>)

I'm not sure if something needs to be fixed here or not. Should the loop vectorizer be made deterministic regardless of predecessor ordering? Or should the predecessor ordering have been fixed up from the bitcode file?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions