Skip to content

Commit bdf2737

Browse files
committed
i386: Improve __int128 argument passing (in ix86_expand_move).
Passing 128-bit integer (TImode) parameters on x86_64 can sometimes result in surprising code. Consider the example below (from PR 43644): unsigned __int128 foo(unsigned __int128 x, unsigned long long y) { return x+y; } which currently results in 6 consecutive movq instructions: foo: movq %rsi, %rax movq %rdi, %rsi movq %rdx, %rcx movq %rax, %rdi movq %rsi, %rax movq %rdi, %rdx addq %rcx, %rax adcq $0, %rdx ret The underlying issue is that during RTL expansion, we generate the following initial RTL for the x argument: (insn 4 3 5 2 (set (reg:TI 85) (subreg:TI (reg:DI 86) 0)) "pr43644-2.c":5:1 -1 (nil)) (insn 5 4 6 2 (set (subreg:DI (reg:TI 85) 8) (reg:DI 87)) "pr43644-2.c":5:1 -1 (nil)) (insn 6 5 7 2 (set (reg/v:TI 84 [ x ]) (reg:TI 85)) "pr43644-2.c":5:1 -1 (nil)) which by combine/reload becomes (insn 25 3 22 2 (set (reg/v:TI 84 [ x ]) (const_int 0 [0])) "pr43644-2.c":5:1 -1 (nil)) (insn 22 25 23 2 (set (subreg:DI (reg/v:TI 84 [ x ]) 0) (reg:DI 93)) "pr43644-2.c":5:1 90 {*movdi_internal} (expr_list:REG_DEAD (reg:DI 93) (nil))) (insn 23 22 28 2 (set (subreg:DI (reg/v:TI 84 [ x ]) 8) (reg:DI 94)) "pr43644-2.c":5:1 90 {*movdi_internal} (expr_list:REG_DEAD (reg:DI 94) (nil))) where the heavy use of SUBREG SET_DESTs creates challenges for both combine and register allocation. The improvement proposed here is to avoid these problematic SUBREGs by adding (two) special cases to ix86_expand_move. For insn 4, which sets a TImode destination from a paradoxical SUBREG, to assign the lowpart, we can use an explicit zero extension (zero_extendditi2 was added in July 2022), and for insn 5, which sets the highpart of a TImode register we can use the *insvti_highpart_1 instruction (that was added in May 2023, after being approved for stage1 in January). This allows combine to work its magic, merging these insns into a *concatditi3 and from there into other optimized forms. So for the test case above, we now generate only a single movq: foo: movq %rdx, %rax xorl %edx, %edx addq %rdi, %rax adcq %rsi, %rdx ret But there is a little bad news. This patch causes two (minor) missed optimization regressions on x86_64; gcc.target/i386/pr82580.c and gcc.target/i386/pr91681-1.c. As shown in the test case above, we're no longer generating adcq $0, but instead using xorl. For the other FAIL, register allocation now has more freedom and is (arbitrarily) choosing a register assignment that doesn't match what the test is expecting. These issues are easier to explain and fix once this patch is in the tree. The good news is that this approach fixes a number of long standing issues, that need to checked in bugzilla, including PR target/110533 which was just opened/reported earlier this week. 2023-07-07 Roger Sayle <roger@nextmovesoftware.com> gcc/ChangeLog PR target/43644 PR target/110533 * config/i386/i386-expand.cc (ix86_expand_move): Convert SETs of TImode destinations from paradoxical SUBREGs (setting the lowpart) into explicit zero extensions. Use *insvti_highpart_1 instruction to set the highpart of a TImode destination. gcc/testsuite/ChangeLog PR target/43644 PR target/110533 * gcc.target/i386/pr110533.c: New test case. * gcc.target/i386/pr43644-2.c: Likewise.
1 parent f934c57 commit bdf2737

File tree

3 files changed

+46
-0
lines changed

3 files changed

+46
-0
lines changed

gcc/config/i386/i386-expand.cc

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -429,6 +429,16 @@ ix86_expand_move (machine_mode mode, rtx operands[])
429429

430430
default:
431431
break;
432+
433+
case SUBREG:
434+
/* Transform TImode paradoxical SUBREG into zero_extendditi2. */
435+
if (TARGET_64BIT
436+
&& mode == TImode
437+
&& SUBREG_P (op1)
438+
&& GET_MODE (SUBREG_REG (op1)) == DImode
439+
&& SUBREG_BYTE (op1) == 0)
440+
op1 = gen_rtx_ZERO_EXTEND (TImode, SUBREG_REG (op1));
441+
break;
432442
}
433443

434444
if ((flag_pic || MACHOPIC_INDIRECT)
@@ -532,6 +542,24 @@ ix86_expand_move (machine_mode mode, rtx operands[])
532542
}
533543
}
534544

545+
/* Use *insvti_highpart_1 to set highpart of TImode register. */
546+
if (TARGET_64BIT
547+
&& mode == DImode
548+
&& SUBREG_P (op0)
549+
&& SUBREG_BYTE (op0) == 8
550+
&& GET_MODE (SUBREG_REG (op0)) == TImode
551+
&& REG_P (SUBREG_REG (op0))
552+
&& REG_P (op1))
553+
{
554+
wide_int mask = wi::mask (64, false, 128);
555+
rtx tmp = immed_wide_int_const (mask, TImode);
556+
op0 = SUBREG_REG (op0);
557+
tmp = gen_rtx_AND (TImode, copy_rtx (op0), tmp);
558+
op1 = gen_rtx_ZERO_EXTEND (TImode, op1);
559+
op1 = gen_rtx_ASHIFT (TImode, op1, GEN_INT (64));
560+
op1 = gen_rtx_IOR (TImode, tmp, op1);
561+
}
562+
535563
emit_insn (gen_rtx_SET (op0, op1));
536564
}
537565

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
/* { dg-do compile { target int128 } } */
2+
/* { dg-options "-O0" } */
3+
4+
__attribute__((naked))
5+
void fn(__int128 a) {
6+
asm("ret");
7+
}
8+
9+
/* { dg-final { scan-assembler-not "mov" } } */
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
/* { dg-do compile { target int128 } } */
2+
/* { dg-options "-O2" } */
3+
4+
unsigned __int128 foo(unsigned __int128 x, unsigned long long y)
5+
{
6+
return x+y;
7+
}
8+
9+
/* { dg-final { scan-assembler-times "movq" 1 } } */

0 commit comments

Comments
 (0)