[InstCombine] Decompose constant xor operand if possible #147599

sgundapa · 2025-07-08T21:17:55Z

Try to transform XOR(A, B+C) in to XOR(A,C) + B where XOR(A,C) is an existing instruction. This transformation is true under the following conditions
Check 1 - B and C are disjoint.
Check 2 - XOR(A,C) and B are disjoint.

This transformation is beneficial particularly for GEPs because Disjoint OR operations often map better to addressing modes than XOR. This can enable further optimizations in the GEP offset folding pipeline

Try to transform XOR(A, B+C) in to XOR(A,C) + B where XOR(A,C) is an existing instruction. This transformation is true under the following conditions Check 1 - B and C are disjoint. Check 2 - XOR(A,C) and B are disjoint. This transformation is beneficial particularly for GEPs because Disjoint OR operations often map better to addressing modes than XOR. This can enable further optimizations in the GEP offset folding pipeline

llvmbot · 2025-07-08T21:18:26Z

@llvm/pr-subscribers-llvm-transforms

Author: Sumanth Gundapaneni (sgundapa)

Changes

Try to transform XOR(A, B+C) in to XOR(A,C) + B where XOR(A,C) is an existing instruction. This transformation is true under the following conditions
Check 1 - B and C are disjoint.
Check 2 - XOR(A,C) and B are disjoint.

This transformation is beneficial particularly for GEPs because Disjoint OR operations often map better to addressing modes than XOR. This can enable further optimizations in the GEP offset folding pipeline

Full diff: https://github.com/llvm/llvm-project/pull/147599.diff

2 Files Affected:

(modified) llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp (+65)
(added) llvm/test/Transforms/InstCombine/xor-to-or.ll (+31)

diff --git a/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp b/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
index 706cb828acc63..5a8a57acf544c 100644
--- a/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
+++ b/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp
@@ -4864,6 +4864,68 @@ Instruction *InstCombinerImpl::foldNot(BinaryOperator &I) {
   return nullptr;
 }
 
+/// Try to transform a chain of XORs into a disjoint OR to expose a constant
+/// offset for GEP and other optimizations. This fold transforms:
+///   Current = xor(X, K)
+/// into:
+///   Result = or(xor(X, C), B)
+/// where a dominating `xor(X, C)` exists, K = B|C, B&C=0, and the operands
+/// of the new `or` are proven disjoint. This is beneficial for address
+/// calculations where `or` can be folded into `add`.
+static Instruction *transformChainedXorToOrDisjoint(BinaryOperator &I,
+                                                    InstCombinerImpl &IC) {
+  Value *X;
+  ConstantInt *CurrentC;
+  if (!match(&I, m_Xor(m_Value(X), m_ConstantInt(CurrentC))))
+    return nullptr;
+
+  // Find the best dominating base XOR.
+  BinaryOperator *BestBaseXor = nullptr;
+  APInt SmallestConst = CurrentC->getValue();
+
+  for (User *U : X->users()) {
+    if (U == &I)
+      continue;
+
+    // Look for sibling instruction: xor(X, SiblingC)
+    BinaryOperator *SiblingXor;
+    ConstantInt *SiblingC;
+    if (!match(U, m_CombineAnd(m_BinOp(SiblingXor),
+                               m_Xor(m_Specific(X), m_ConstantInt(SiblingC)))))
+      continue;
+
+    const APInt &SiblingConstVal = SiblingC->getValue();
+
+    // To be a better base, the sibling must have a smaller constant and
+    // must dominate the instruction we are currently trying to transform.
+    if (SiblingConstVal.slt(SmallestConst) &&
+        IC.getDominatorTree().dominates(SiblingXor, &I)) {
+      BestBaseXor = SiblingXor;
+      SmallestConst = SiblingConstVal;
+    }
+  }
+
+  if (!BestBaseXor)
+    return nullptr;
+
+  // We found a suitable base. Validate the transformation via disjointness
+  // checks.
+  const APInt NewConstVal = CurrentC->getValue() - SmallestConst;
+
+  // Check 1: The constant bits must be disjoint. (K = B|C implies B&C=0)
+  if ((NewConstVal & SmallestConst) != 0)
+    return nullptr;
+
+  // Check 2: The base value (xor(X,C)) must be disjoint from the new offset
+  // (B).
+  if (!IC.MaskedValueIsZero(BestBaseXor, NewConstVal, &I))
+    return nullptr;
+
+  // All checks passed. Create the new 'or' instruction.
+  Constant *NewConst = ConstantInt::get(I.getType(), NewConstVal);
+  return BinaryOperator::CreateDisjointOr(BestBaseXor, NewConst);
+}
+
 // FIXME: We use commutative matchers (m_c_*) for some, but not all, matches
 // here. We should standardize that construct where it is needed or choose some
 // other way to ensure that commutated variants of patterns are not missed.
@@ -5201,5 +5263,8 @@ Instruction *InstCombinerImpl::visitXor(BinaryOperator &I) {
   if (Instruction *Res = foldBitwiseLogicWithIntrinsics(I, Builder))
     return Res;
 
+  if (Instruction *Transformed = transformChainedXorToOrDisjoint(I, *this))
+    return Transformed;
+
   return nullptr;
 }
diff --git a/llvm/test/Transforms/InstCombine/xor-to-or.ll b/llvm/test/Transforms/InstCombine/xor-to-or.ll
new file mode 100644
index 0000000000000..87b378338e3f9
--- /dev/null
+++ b/llvm/test/Transforms/InstCombine/xor-to-or.ll
@@ -0,0 +1,31 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -mtriple=amdgcn-amd-amdhsa -passes=instcombine \
+; RUN: -S < %s | FileCheck %s
+
+define half @xor_to_or_disjoint(i1 %0, ptr %ptr) {
+; CHECK-LABEL: define half @xor_to_or_disjoint(
+; CHECK-SAME: i1 [[TMP0:%.*]], ptr [[PTR:%.*]]) {
+; CHECK-NEXT:  [[ENTRY:.*:]]
+; CHECK-NEXT:    [[ADDR1:%.*]] = select i1 [[TMP0]], i64 32, i64 256
+; CHECK-NEXT:    [[GEP1:%.*]] = getelementptr i8, ptr [[PTR]], i64 [[ADDR1]]
+; CHECK-NEXT:    [[TMP1:%.*]] = getelementptr i8, ptr [[PTR]], i64 [[ADDR1]]
+; CHECK-NEXT:    [[GEP2:%.*]] = getelementptr i8, ptr [[TMP1]], i64 2048
+; CHECK-NEXT:    [[VAL1:%.*]] = load half, ptr [[GEP1]], align 2
+; CHECK-NEXT:    [[VAL2:%.*]] = load half, ptr [[GEP2]], align 2
+; CHECK-NEXT:    [[RESULT_H:%.*]] = fadd half [[VAL1]], [[VAL2]]
+; CHECK-NEXT:    ret half [[RESULT_H]]
+;
+entry:
+  %base = select i1 %0, i64 0, i64 288
+  %addr1 = xor i64 %base, 32
+  %addr2 = xor i64 %base, 2080
+  %gep1 = getelementptr i8, ptr %ptr, i64 %addr1
+  %gep2 = getelementptr i8, ptr %ptr, i64 %addr2
+  %val1 = load half, ptr %gep1
+  %val2 = load half, ptr %gep2
+  %val1.f = fpext half %val1 to float
+  %val2.f = fpext half %val2 to float
+  %sum1.f = fadd float %val1.f, %val2.f
+  %result.h = fptrunc float %sum1.f to half
+  ret half %result.h
+}

nikic

Hm, I don't think that this transform is a good fit for InstCombine. Converting x ^ c1 into (x ^ c2) | c1 seems like a less canonical form, as it increases the dependence distance. My baseline expectation would be that InstCombine does the fold in the reverse direction.

As far as I understand, this fold is only actually beneficial if the new form can be folded into a GEP offset. I think you need to only handle that case in particular. But even then I'm not sure that InstCombine is the right place for it (rather than the backend).

nikic · 2025-07-09T09:41:05Z

llvm/test/Transforms/InstCombine/xor-to-or.ll

@@ -0,0 +1,31 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt -mtriple=amdgcn-amd-amdhsa -passes=instcombine \


Triple shouldn't be necessary.

nikic · 2025-07-09T09:41:18Z

llvm/test/Transforms/InstCombine/xor-to-or.ll

+  %val1.f = fpext half %val1 to float
+  %val2.f = fpext half %val2 to float
+  %sum1.f = fadd float %val1.f, %val2.f
+  %result.h = fptrunc float %sum1.f to half


This test contains a lot of unrelated instructions.

nikic · 2025-07-09T09:41:56Z

llvm/test/Transforms/InstCombine/xor-to-or.ll

+  %sum1.f = fadd float %val1.f, %val2.f
+  %result.h = fptrunc float %sum1.f to half
+  ret half %result.h
+}


This transform needs more tests to exercise the pre-conditions of the transform, e.g. the disjointness.

sgundapa · 2025-07-09T14:16:26Z

Thanks for reviewing @nikic. I have originally attempted to do this here #135788, and was directed to give a try in InstCombine. Would it make sense if I restrict the transformation to just GEPs. What I am really after is folding these xor computations in to GEPs, so that I do not have to recompute address for each of the loads (referring to the attached test in the patch).

sgundapa requested a review from nikic as a code owner July 8, 2025 21:17

llvmbot added llvm:instcombine Covers the InstCombine, InstSimplify and AggressiveInstCombine passes llvm:transforms labels Jul 8, 2025

sgundapa requested review from jrbyrnes and arsenm July 8, 2025 21:21

This was referenced Jul 9, 2025

Fuzz PR147599 dtcxzyw/llvm-fuzz-service#101

Closed

Task submission dtcxzyw/llvm-opt-benchmark#1312

Open

zyw-bot mentioned this pull request Jul 9, 2025

pre-commit: PR147599 dtcxzyw/llvm-opt-benchmark#2563

Closed

nikic reviewed Jul 9, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[InstCombine] Decompose constant xor operand if possible #147599

[InstCombine] Decompose constant xor operand if possible #147599

sgundapa commented Jul 8, 2025

Uh oh!

llvmbot commented Jul 8, 2025

Uh oh!

nikic left a comment

Uh oh!

nikic Jul 9, 2025

Uh oh!

nikic Jul 9, 2025

Uh oh!

nikic Jul 9, 2025

Uh oh!

sgundapa commented Jul 9, 2025

Uh oh!

Uh oh!

		@@ -0,0 +1,31 @@
		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
		; RUN: opt -mtriple=amdgcn-amd-amdhsa -passes=instcombine \

[InstCombine] Decompose constant xor operand if possible #147599

Are you sure you want to change the base?

[InstCombine] Decompose constant xor operand if possible #147599

Conversation

sgundapa commented Jul 8, 2025

Uh oh!

llvmbot commented Jul 8, 2025

Uh oh!

nikic left a comment

Choose a reason for hiding this comment

Uh oh!

nikic Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

nikic Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

nikic Jul 9, 2025

Choose a reason for hiding this comment

Uh oh!

sgundapa commented Jul 9, 2025

Uh oh!

Uh oh!