Skip to content

Commit ff37b11

Browse files
committed
[LegalizeVectorOps][X86] Don't defer BITREVERSE expansion to LegalizeDAG.
By expanding early it allows the shifts to be custom lowered in LegalizeVectorOps. Then a DAG combine is able to run on them before LegalizeDAG handles the BUILD_VECTORS for the masks used. v16Xi8 shift lowering on X86 requires a mask to be applied to a v8i16 shift. The BITREVERSE expansion applied an AND mask before SHL ops and after SRL ops. This was done to share the same mask constant for both shifts. It looks like this patch allows DAG combine to remove the AND mask added after v16i8 SHL by X86 lowering. This maintains the mask sharing that BITREVERSE was trying to achieve. Prior to this patch it looks like we kept the mask after the SHL instead which required an extra constant pool or a PANDN to invert it. This is dependent on D112248 because RISCV will end up scalarizing the BSWAP portion of the BITREVERSE expansion if we don't disable BSWAP scalarization in LegalizeVectorOps first. Reviewed By: RKSimon Differential Revision: https://reviews.llvm.org/D112254
1 parent c0d6e1b commit ff37b11

File tree

4 files changed

+355
-372
lines changed

4 files changed

+355
-372
lines changed

llvm/lib/CodeGen/SelectionDAG/LegalizeVectorOps.cpp

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1162,9 +1162,10 @@ void VectorLegalizer::ExpandBITREVERSE(SDNode *Node,
11621162
if (TLI.isOperationLegalOrCustom(ISD::SHL, VT) &&
11631163
TLI.isOperationLegalOrCustom(ISD::SRL, VT) &&
11641164
TLI.isOperationLegalOrCustomOrPromote(ISD::AND, VT) &&
1165-
TLI.isOperationLegalOrCustomOrPromote(ISD::OR, VT))
1166-
// Let LegalizeDAG handle this later.
1165+
TLI.isOperationLegalOrCustomOrPromote(ISD::OR, VT)) {
1166+
Results.push_back(TLI.expandBITREVERSE(Node, DAG));
11671167
return;
1168+
}
11681169

11691170
// Otherwise unroll.
11701171
SDValue Tmp = DAG.UnrollVectorOp(Node);

llvm/test/CodeGen/X86/bitreverse.ll

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -58,10 +58,11 @@ define <2 x i16> @test_bitreverse_v2i16(<2 x i16> %a) nounwind {
5858
; X64-NEXT: psllw $8, %xmm0
5959
; X64-NEXT: por %xmm1, %xmm0
6060
; X64-NEXT: movdqa %xmm0, %xmm1
61-
; X64-NEXT: psllw $4, %xmm1
62-
; X64-NEXT: pand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm1
63-
; X64-NEXT: psrlw $4, %xmm0
64-
; X64-NEXT: pand {{\.?LCPI[0-9]+_[0-9]+}}(%rip), %xmm0
61+
; X64-NEXT: psrlw $4, %xmm1
62+
; X64-NEXT: movdqa {{.*#+}} xmm2 = [15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15]
63+
; X64-NEXT: pand %xmm2, %xmm1
64+
; X64-NEXT: pand %xmm2, %xmm0
65+
; X64-NEXT: psllw $4, %xmm0
6566
; X64-NEXT: por %xmm1, %xmm0
6667
; X64-NEXT: movdqa %xmm0, %xmm1
6768
; X64-NEXT: psrlw $2, %xmm1

llvm/test/CodeGen/X86/combine-bitreverse.ll

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -50,10 +50,11 @@ define <4 x i32> @test_demandedbits_bitreverse(<4 x i32> %a0) nounwind {
5050
; X86-NEXT: pshufhw {{.*#+}} xmm0 = xmm0[0,1,2,3,7,6,5,4]
5151
; X86-NEXT: packuswb %xmm2, %xmm0
5252
; X86-NEXT: movdqa %xmm0, %xmm1
53-
; X86-NEXT: psllw $4, %xmm1
54-
; X86-NEXT: pand {{\.?LCPI[0-9]+_[0-9]+}}, %xmm1
55-
; X86-NEXT: psrlw $4, %xmm0
56-
; X86-NEXT: pand {{\.?LCPI[0-9]+_[0-9]+}}, %xmm0
53+
; X86-NEXT: psrlw $4, %xmm1
54+
; X86-NEXT: movdqa {{.*#+}} xmm2 = [15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15]
55+
; X86-NEXT: pand %xmm2, %xmm1
56+
; X86-NEXT: pand %xmm2, %xmm0
57+
; X86-NEXT: psllw $4, %xmm0
5758
; X86-NEXT: por %xmm1, %xmm0
5859
; X86-NEXT: movdqa %xmm0, %xmm1
5960
; X86-NEXT: psrlw $2, %xmm1

0 commit comments

Comments
 (0)