Skip to content

Oprimize writing step of parallel builds #10779

@ezio-melotti

Description

@ezio-melotti

Is your feature request related to a problem? Please describe.
During the writing step of the build process, even while using -j auto, there are some time-consuming operations that are performed serially. It might be possible to parallelize them and further improve performances.

Describe the solution you'd like
Last night I went down a rabbit hole and tried to optimize _write_parallel:

def _write_parallel(self, docnames: Sequence[str], nproc: int) -> None:
def write_process(docs: List[Tuple[str, nodes.document]]) -> None:
self.app.phase = BuildPhase.WRITING
for docname, doctree in docs:
self.write_doc(docname, doctree)
# warm up caches/compile templates using the first document
firstname, docnames = docnames[0], docnames[1:]
self.app.phase = BuildPhase.RESOLVING
doctree = self.env.get_and_resolve_doctree(firstname, self)
self.app.phase = BuildPhase.WRITING
self.write_doc_serialized(firstname, doctree)
self.write_doc(firstname, doctree)
tasks = ParallelTasks(nproc)
chunks = make_chunks(docnames, nproc)
# create a status_iterator to step progressbar after writing a document
# (see: ``on_chunk_done()`` function)
progress = status_iterator(chunks, __('writing output... '), "darkgreen",
len(chunks), self.app.verbosity)
def on_chunk_done(args: List[Tuple[str, NoneType]], result: NoneType) -> None:
next(progress)
self.app.phase = BuildPhase.RESOLVING
for chunk in chunks:
arg = []
for docname in chunk:
doctree = self.env.get_and_resolve_doctree(docname, self)
self.write_doc_serialized(docname, doctree)
arg.append((docname, doctree))
tasks.add_task(write_process, arg, on_chunk_done)
# make sure all threads have finished
tasks.join()
logger.info('')

I tried to naively apply the following changes:

diff --git a/sphinx/builders/__init__.py b/sphinx/builders/__init__.py
index 2aede5c24..d27df5fdd 100644
--- a/sphinx/builders/__init__.py
+++ b/sphinx/builders/__init__.py
@@ -569,9 +569,11 @@ class Builder:
                 self.write_doc(docname, doctree)
 
     def _write_parallel(self, docnames: Sequence[str], nproc: int) -> None:
-        def write_process(docs: List[Tuple[str, nodes.document]]) -> None:
-            self.app.phase = BuildPhase.WRITING
-            for docname, doctree in docs:
+        def write_process(docs: List[str]) -> None:
+            for docname in docs:
+                doctree = self.env.get_and_resolve_doctree(docname, self)
+                self.app.phase = BuildPhase.WRITING
+                self.write_doc_serialized(docname, doctree)
                 self.write_doc(docname, doctree)
 
         # warm up caches/compile templates using the first document
@@ -595,12 +597,7 @@ class Builder:
 
         self.app.phase = BuildPhase.RESOLVING
         for chunk in chunks:
-            arg = []
-            for docname in chunk:
-                doctree = self.env.get_and_resolve_doctree(docname, self)
-                self.write_doc_serialized(docname, doctree)
-                arg.append((docname, doctree))
-            tasks.add_task(write_process, arg, on_chunk_done)
+            tasks.add_task(write_process, chunk, on_chunk_done)
 
         # make sure all threads have finished
         tasks.join()

This resulted in a build time reduction of ~25%.

Without patch, using Sphinx v4.5.0 to build the python/cpython docs:

real    2m13,386s
user    4m35,046s
sys     0m7,205s

With patch:

real    1m36,804s
user    4m42,314s
sys     0m5,909s

Despite the performance improvements, this solution is wrong for (at least) a few reasons:

  • while it mostly works, some things (e.g. images) break;
  • write_doc_serialized was explicitly created for executing code that can't be parallelized, so it shouldn't be called in a process executed in parallel;

However, write_doc_serialized was added ~10 years ago in 5cd0841 to fix "fix parallel build globals problems". At the time the parallelization also relied on threading, whereas now it seems entirely based on multiprocessing. In builders/__init__.py it's defined as an empty method to be overridden, and the html builder overrides it with:

def write_doc_serialized(self, docname: str, doctree: nodes.document) -> None:
self.imgpath = relative_uri(self.get_target_uri(docname), self.imagedir)
self.post_process_images(doctree)
title_node = self.env.longtitles.get(docname)
title = self.render_partial(title_node)['title'] if title_node else ''
self.index_page(docname, doctree, title)

In addition, I noticed that _read_parallel seems to do some post-processing/merging in

def merge(docs: List[str], otherenv: bytes) -> None:
env = pickle.loads(otherenv)
self.env.merge_info_from(docs, env, self.app)
next(progress)
tasks = ParallelTasks(nproc)
for chunk in chunks:
tasks.add_task(read_process, chunk, merge)

At this point, my lack of knowledge of Sphinx internals prevented me to dig deeper, but I was left wondering:

  1. if the fix added in 5cd0841 is still relevant after moving away from threading, or if it can be revisited/reverted (at least partially);
  2. if the issue fixed by the above commit can be addressed with some post-processing similar to the one used in _read_parallel;
  3. if any of the operations currently executed serially by write_doc_serialized can be parallelized;

Perhaps someone more familiar with the Sphinx internal can take a look and confirm whether there is room for improvement or not?

Addressing this (especially 2. above) might also help fix the following issue (cc @tk0miya, since you worked on this code):

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions