Skip to content

Commit d069ed2

Browse files
Petr TesarikChristoph Hellwig
authored andcommitted
swiotlb: optimize get_max_slots()
Use a simple logical shift and increment to calculate the number of slots taken by the DMA segment boundary. At least GCC-13 is not able to optimize the expression, producing this horrible assembly code on x86: cmpq $-1, %rcx je .L364 addq $2048, %rcx shrq $11, %rcx movq %rcx, %r13 .L331: // rest of the function here... // after function epilogue and return: .L364: movabsq $9007199254740992, %r13 jmp .L331 After the optimization, the code looks more reasonable: shrq $11, %r11 leaq 1(%r11), %rbx Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com> Signed-off-by: Christoph Hellwig <hch@lst.de>
1 parent f94cb36 commit d069ed2

File tree

1 file changed

+1
-3
lines changed

1 file changed

+1
-3
lines changed

kernel/dma/swiotlb.c

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -903,9 +903,7 @@ static inline phys_addr_t slot_addr(phys_addr_t start, phys_addr_t idx)
903903
*/
904904
static inline unsigned long get_max_slots(unsigned long boundary_mask)
905905
{
906-
if (boundary_mask == ~0UL)
907-
return 1UL << (BITS_PER_LONG - IO_TLB_SHIFT);
908-
return nr_slots(boundary_mask + 1);
906+
return (boundary_mask >> IO_TLB_SHIFT) + 1;
909907
}
910908

911909
static unsigned int wrap_area_index(struct io_tlb_pool *mem, unsigned int index)

0 commit comments

Comments
 (0)