Description
Given this reduced bitcode: reduced.bc.zip
opt redcued.bc -S -o -
shows that it has a block with predecessors in a specific order:
particle_skip.exit: ; preds = %if.then35.i, %if.then35.i, %if.then35.i, %2
%maxloop.15 = phi i32 [ %maxloop.14, %2 ], [ %maxloop.08, %if.then35.i ], [ %maxloop.08, %if.then35.i ], [ %maxloop.08, %if.then35.i ]
%inc102 = add nuw nsw i32 %p.09, 1
%exitcond.not = icmp eq i32 %inc102, %0
br i1 %exitcond.not, label %for.end103.loopexit, label %if.then35.i, !llvm.loop !0
However if you it convert it to textual IR with llvm-dis reduced.bc
and then inspect it with opt reduced.ll -S -o -
, the computed predecessors are in a different order:
particle_skip.exit: ; preds = %2, %if.then35.i, %if.then35.i, %if.then35.i
%maxloop.15 = phi i32 [ %maxloop.14, %2 ], [ %maxloop.08, %if.then35.i ], [ %maxloop.08, %if.then35.i ], [ %maxloop.08, %if.then35.i ]
%inc102 = add nuw nsw i32 %p.09, 1
%exitcond.not = icmp eq i32 %inc102, %0
br i1 %exitcond.not, label %for.end103.loopexit, label %if.then35.i, !llvm.loop !0
Also running the bitcode through opt reduced.bc -o reduced.bc.2
causes the predecessors to be "fixed" into the same order as above.
But this difference in predecessor order is enough for the loop vectorizer to emit different code if run with opt -disable-output -p loop-vectorize -force-tail-folding-style=data-with-evl -prefer-predicate-over-epilogue=predicate-dont-vectorize
: We end up with extra recipes due to the way the VPBlendPHIRecipe is constructed based on the predecessors:
WIDEN ir<%maxloop.14> = add ir<%maxloop.08>, ir<1>
BLEND ir<%maxloop.15> = ir<%maxloop.14> ir<%maxloop.08>/vp<%15> ir<%maxloop.08>/vp<%15> ir<%maxloop.08>/vp<%15>
WIDEN-INTRINSIC vp<%16> = call llvm.vp.merge(ir<true>, ir<%maxloop.15>, ir<%maxloop.08>, vp<%12>)
vs
EMIT vp<%16> = not vp<%15>
EMIT vp<%17> = logical-and vp<%14>, vp<%16>
WIDEN ir<%maxloop.14> = add ir<%maxloop.08>, ir<1>
BLEND ir<%maxloop.15> = ir<%maxloop.08> ir<%maxloop.08>/vp<%15> ir<%maxloop.08>/vp<%15> ir<%maxloop.14>/vp<%17>
WIDEN-INTRINSIC vp<%18> = call llvm.vp.merge(ir<true>, ir<%maxloop.15>, ir<%maxloop.08>, vp<%12>)
I'm not sure if something needs to be fixed here or not. Should the loop vectorizer be made deterministic regardless of predecessor ordering? Or should the predecessor ordering have been fixed up from the bitcode file?