-
Notifications
You must be signed in to change notification settings - Fork 37
Description
There's a good deal of documentation about thread-safety (correctness), but not much about multithreaded performance. This is a bit different from #149, and I think we should do both.
I'd like the page to help the reader develop a basic mental model for how to write programs that will scale well in the free threaded Python builds. Here are some things that may be worth covering.
Reference count contention
Frequent accesses to the same object can inhibit scaling.
Recommendations:
- Use data that's private to the thread
- Aggregate results at the end of a task. A basic counting example may help here, something like
Good:
global_counter = 0
global_lock = threading.Lock()
def my_thread():
counter = 0
for _ in range(...):
counter += 1
with global_lock:
global_counter += counter
Collection (dict, list, set) performance
The builtin dict
, list
, and set
classes not designed to be concurrent collections. They are designed to be thread-safe, but not necessarily scale well for multithreaded access.
Concurrent reads from a shared dictionary:
Sometimes scales well, but reference count contention on the dictionary may be a bottleneck
Frequent concurrent writes or reads & writes to a shared dictionary:
Avoid. Does not scale well due to contention on dictionary's lock
Recommendation: use ft_utils
or some other data structure if you need a concurrent collection
Task size?
Your tasks size has to be big enough that the overhead of dispatching to a thread is much smaller than the time it takes to run the task.
Recommendation: make your tasks bigger (include example with concurrent.futures.ThreadPoolExecutor)
Gotchas
random
: can be a source of unexpected bottlenecks