Reduce reference counting overhead with static analysis and deferred reference counts. #390
markshannon
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
From stats gathered from running the test suite, the average object sees 6 increfs and 7 decrefs over its lifetime.
There are about 1.1 increfs and 1.3 decrefs per bytecode instruction executed.
These seem to be split about 55%/45% between the interpreter and all other code ("Interpreter" includes frame and generator operations)
[TO DO: Get stats from standard benchmark suite]
Each incref/decref pair removed saves four memory reads or writes, which is the same as instruction dispatch for 3.11 (we expect to reduce instruction dispatch to three memory accesses for 3.12).
Some of these reference count modifications are necessary as they represent references from the heap.
However, it should be possible to eliminate many counts for references from the stack.
There are a number of places to look that should show improvements.
Deferred reference counts
Suppose our static analysis shows that the local
a
outlives the expressionx = a + b
.We could then, emit bytecode that does not incref
a
when pushing it to the stack, and does not decref it whena + b
has been computed. The reference toa
on the stack would be deferred.Tagged pointers
The frame stack consists of intermixed locals array, stack and some linkage data.
Some of the
PyObject *
pointers are already kept as borrowed references (globals and builtins) for efficiency, but all others are strong references. Using more borrowed references would be more efficient, but is easy to get wrong.By adding a tag bit to the pointer, we can use it to reliably unwind the stack, and sanity check our static analysis.
One other advantage of using a tag, is that we can mark non-references as "borrowed", simplifying frame exit and unwinding code. We no longer need to care whether a word is a pointer or not, as long as it is tagged correctly.
Deferring references
If we can prove that there is a strong reference to an object in a (transitive) caller then, provided we guard against deletion in debuggers, we can defer (make borrowed) any reference to that object.
If we track which references are deferred statically we can eliminate all overhead for that reference. If we can only track it dynamically then we only need check the low-bit of the tagged value, and don't need to dereference it, still saving the memory accesses.
Beta Was this translation helpful? Give feedback.
All reactions