-
Notifications
You must be signed in to change notification settings - Fork 52
Python 3.12 Goals
WIP
This is a short summary of the major themes of work that the Faster CPython plans to land in 3.12.
While the speed improvements in 3.11 mainly involved replacing individual opcodes with faster context-specific ones (adaptive opcode specialization), the next big set of improvements will come from optimizing runs of multiple opcodes. To enable this, many of the existing high-level opcodes will be replaced with lower-level opcodes, for example, separate opcodes for reference counting and pushing/popping from the stack. These simpler opcodes will have more opportunities for optimizations, for example, by removing redundant reference count operations. These lower-level opcodes also put us closer to a set of instructions suitable for machine code generation in the future (both in CPython and third-party JIT projects).
To enable this, the interpreter loop will be generated from a declarative description. This should reduce a class of bugs related to keeping the interpreter loop in sync with some related functions (mark_stacks, stack_effect etc.), but also allow us to experiment with large changes to the interpreter loop. For example, generating limited-purpose interpreter loops, such as with and without tracing hooks.
Python currently has a single global interpreter lock per process, which prevents multi-threaded parallelism. This work, described in PEP 684, is to make all global state thread safe and move to a global interpreter lock (GIL) per sub-interpreter. Additionally, PEP 554 will make it possible to create subinterpreters from Python (currently a C API-only feature), opening up true multi-threaded parallelism.
We have done an analysis of which bytecodes would benefit the most from specialization and plan to complete the remaining high-benefit ones for 3.12.
There are a number of opportunities for decreasing the size of Python object structs. Since they are used so frequently, this benefits not just overall memory usage, but cache coherency as well. We plan to implement the most promising of these ideas for 3.12.
In addition to the above projects, the team is contributing to the overall quality of the CPython codebase:
- Making it easier to write tests for the compiler.
- Proactively monitoring code coverage of the CPython test suite at the C level.
- Improving the pyperformance benchmarking suite to include more representative real-world workloads.
- Assisting with CPython issues and PRs with the "performance" label.
- Increasing our set of standard benchmarking machines and results to include macOS and Windows.
- Continuing to collaborate with major projects that make use of Python internals to help them adapt to changes in the CPython interpreter.