Skip to content

Commit b30fa1b

Browse files
committed
runtime: improve scan inner loop
On every arch except amd64, it is faster to do x&(x-1) than x^(1<<n). Most archs need 3 instructions for the latter: MOV $1, R; SLL n, R; ANDN R, x. Maybe 4 if there's no ANDN. Most archs need only 2 instructions to do x&(x-1). It takes 3 on x86/amd64 because NEG only works in place. Only amd64 can do x^(1<<n) in a single instruction. (We could on 386 also, but that's currently not implemented.) Change-Id: I3b74b7a466ab972b20a25dbb21b572baf95c3467 Reviewed-on: https://go-review.googlesource.com/c/go/+/672956 Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
1 parent c31a5c5 commit b30fa1b

File tree

1 file changed

+7
-2
lines changed

1 file changed

+7
-2
lines changed

src/runtime/mbitmap.go

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -219,8 +219,13 @@ func (tp typePointers) nextFast() (typePointers, uintptr) {
219219
} else {
220220
i = sys.TrailingZeros32(uint32(tp.mask))
221221
}
222-
// BTCQ
223-
tp.mask ^= uintptr(1) << (i & (ptrBits - 1))
222+
if GOARCH == "amd64" {
223+
// BTCQ
224+
tp.mask ^= uintptr(1) << (i & (ptrBits - 1))
225+
} else {
226+
// SUB, AND
227+
tp.mask &= tp.mask - 1
228+
}
224229
// LEAQ (XX)(XX*8)
225230
return tp, tp.addr + uintptr(i)*goarch.PtrSize
226231
}

0 commit comments

Comments
 (0)