Oprimize writing step of parallel builds

**Is your feature request related to a problem? Please describe.**
During the writing step of the build process, even while using `-j auto`, there are some time-consuming operations that are performed serially.  It might be possible to parallelize them and further improve performances.

**Describe the solution you'd like**
Last night I went down a rabbit hole and tried to optimize `_write_parallel`:
https://github.com/sphinx-doc/sphinx/blob/8c4865c30d5fa847d727fea16519d7afce627932/sphinx/builders/__init__.py#L571-L607

I tried to naively apply the following changes:
```diff
diff --git a/sphinx/builders/__init__.py b/sphinx/builders/__init__.py
index 2aede5c24..d27df5fdd 100644
--- a/sphinx/builders/__init__.py
+++ b/sphinx/builders/__init__.py
@@ -569,9 +569,11 @@ class Builder:
                 self.write_doc(docname, doctree)
 
     def _write_parallel(self, docnames: Sequence[str], nproc: int) -> None:
-        def write_process(docs: List[Tuple[str, nodes.document]]) -> None:
-            self.app.phase = BuildPhase.WRITING
-            for docname, doctree in docs:
+        def write_process(docs: List[str]) -> None:
+            for docname in docs:
+                doctree = self.env.get_and_resolve_doctree(docname, self)
+                self.app.phase = BuildPhase.WRITING
+                self.write_doc_serialized(docname, doctree)
                 self.write_doc(docname, doctree)
 
         # warm up caches/compile templates using the first document
@@ -595,12 +597,7 @@ class Builder:
 
         self.app.phase = BuildPhase.RESOLVING
         for chunk in chunks:
-            arg = []
-            for docname in chunk:
-                doctree = self.env.get_and_resolve_doctree(docname, self)
-                self.write_doc_serialized(docname, doctree)
-                arg.append((docname, doctree))
-            tasks.add_task(write_process, arg, on_chunk_done)
+            tasks.add_task(write_process, chunk, on_chunk_done)
 
         # make sure all threads have finished
         tasks.join()
```
This resulted in a build time reduction of ~25%.

Without patch, using Sphinx v4.5.0 to build the python/cpython docs:
```
real    2m13,386s
user    4m35,046s
sys     0m7,205s
```
With patch:
```
real    1m36,804s
user    4m42,314s
sys     0m5,909s
```

---

Despite the performance improvements, this solution is wrong for (at least) a few reasons:
* while it mostly works, some things (e.g. images) break;
* `write_doc_serialized` was explicitly created for executing code that can't be parallelized, so it shouldn't be called in a process executed in parallel;

However, `write_doc_serialized` was added ~10 years ago in 5cd0841e5f041f3ef03840fafac425654a48b40d to fix "fix parallel build globals problems".  At the time the parallelization also relied on `threading`, whereas now it seems entirely based on `multiprocessing`.  In `builders/__init__.py` it's defined as an empty method to be overridden, and the `html` builder overrides it with:
https://github.com/sphinx-doc/sphinx/blob/8c4865c30d5fa847d727fea16519d7afce627932/sphinx/builders/html/__init__.py#L673-L678

In addition, I noticed that `_read_parallel` seems to do some post-processing/merging in
https://github.com/sphinx-doc/sphinx/blob/8c4865c30d5fa847d727fea16519d7afce627932/sphinx/builders/__init__.py#L467-L475

At this point, my lack of knowledge of Sphinx internals prevented me to dig deeper, but I was left wondering:
1. if the fix added in 5cd0841e5f041f3ef03840fafac425654a48b40d is still relevant after moving away from `threading`, or if it can be revisited/reverted (at least partially);
2. if the issue fixed by the above commit can be addressed with some post-processing similar to the one used in `_read_parallel`;
3. if any of the operations currently executed serially by `write_doc_serialized` can be parallelized;

Perhaps someone more familiar with the Sphinx internal can take a look and confirm whether there is room for improvement or not?

Addressing this (especially 2. above) might also help fix the following issue (cc @tk0miya, since you worked on this code):
* #4459





	def write_doc_serialized(self, docname: str, doctree: nodes.document) -> None:
	self.imgpath = relative_uri(self.get_target_uri(docname), self.imagedir)
	self.post_process_images(doctree)
	title_node = self.env.longtitles.get(docname)
	title = self.render_partial(title_node)['title'] if title_node else ''
	self.index_page(docname, doctree, title)

	def merge(docs: List[str], otherenv: bytes) -> None:
	env = pickle.loads(otherenv)
	self.env.merge_info_from(docs, env, self.app)

	next(progress)

	tasks = ParallelTasks(nproc)
	for chunk in chunks:
	tasks.add_task(read_process, chunk, merge)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Oprimize writing step of parallel builds #10779

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

	def _write_parallel(self, docnames: Sequence[str], nproc: int) -> None:
	def write_process(docs: List[Tuple[str, nodes.document]]) -> None:
	self.app.phase = BuildPhase.WRITING
	for docname, doctree in docs:
	self.write_doc(docname, doctree)

	# warm up caches/compile templates using the first document
	firstname, docnames = docnames[0], docnames[1:]
	self.app.phase = BuildPhase.RESOLVING
	doctree = self.env.get_and_resolve_doctree(firstname, self)
	self.app.phase = BuildPhase.WRITING
	self.write_doc_serialized(firstname, doctree)
	self.write_doc(firstname, doctree)

	tasks = ParallelTasks(nproc)
	chunks = make_chunks(docnames, nproc)

	# create a status_iterator to step progressbar after writing a document
	# (see: ``on_chunk_done()`` function)
	progress = status_iterator(chunks, __('writing output... '), "darkgreen",
	len(chunks), self.app.verbosity)

	def on_chunk_done(args: List[Tuple[str, NoneType]], result: NoneType) -> None:
	next(progress)

	self.app.phase = BuildPhase.RESOLVING
	for chunk in chunks:
	arg = []
	for docname in chunk:
	doctree = self.env.get_and_resolve_doctree(docname, self)
	self.write_doc_serialized(docname, doctree)
	arg.append((docname, doctree))
	tasks.add_task(write_process, arg, on_chunk_done)

	# make sure all threads have finished
	tasks.join()
	logger.info('')

Uh oh!

Oprimize writing step of parallel builds #10779

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions