-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Description
Describe the bug
When running Sphinx in multi-process mode, CPU cores are not utilized effectively, particularly during the write phase.
This issue is different from the problem where certain time-consuming operations are performed serially by Sphinx, as reported in #10779.
Steps to Reproduce
See below
Observations
- During the build process, especially in the write phase, only one core is often active while others remain idle.
- This behavior suggests that the multi-process mode is not distributing tasks across available cores effectively.
Analysis
It appears that Sphinx (or the underlying multi-processing library it uses) allocates processes and waits for all of them to complete before starting the next batch. As a result, if one process is handling a significantly larger file, the other cores remain idle, waiting for that process to finish.
Current Workaround
We currently mitigate this issue by shuffling the batch size so that larger files are distributed more evenly. This method, however, is unreliable and can vary with changes in file order or size. For more details, see #10967.
Recommendations
To address this inefficiency, consider replacing the current Python multi-processing engine with a more efficient alternative:
MPIRE: A Python package that simplifies parallel execution and improves core utilization.
Ray: A framework for scaling Python applications from a laptop to a cluster.
Related Issues
- Restructuring multiprocessing #11448 (where I mentioned this as a comment but seems to get lost in the pool of other, obviously nice suggestions)
- Allow setting maxbatch to adjust chunk size for parallel read/write #10967
- Oprimize writing step of parallel builds #10779
- efficient 2-stage parallel builder? #7891
How to Reproduce
- Set up a reasonably large Sphinx project.
- Run Sphinx in multi-process mode.
- Use a CPU profiler, such as
htop
CLI on Linux, to monitor core usage during the build process.
Environment Information
Any version of Sphinx that supports multiprocessing