Performance improvement : Cache pre-pickled documents

We have enormous documents in which some individual files `.. include::` hundreds of external `.rst` files. 

This sometimes leads to individual `.doctrees` files exceeding 5MB. Under this scenario, the build procedure is particularly slow (+5 hours). 

After profiling the code, repeated calls to `pickle.loads()` targeting those 5MB files where found. It appears that Sphinx will `pickle.loads()` the 5MB file [at each cross-reference ](https://github.com/sphinx-doc/sphinx/blob/master/sphinx/transforms/post_transforms/__init__.py#L90). 

While `sphinx/environment/__init__.py` [already caches the raw bytes for each pickled doctree](https://github.com/sphinx-doc/sphinx/blob/04381789db7466d56d9eb29d23d979fc16604acc/sphinx/environment/__init__.py#L194) it would be more efficient to cache the result of `pickle.loads()` instead. 

Caching the pre-pickled `nodes.document` instead of the raw bytes sped up the build process from +5 hours to around 10 minutes (including transformation to PDF with MikTex).  

I have not compared the overhead of both caching methods. But I suspect it would be worth the speedup. 

I have opened a pull request with my workaround. Feel free to let me know your thoughts ! 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Performance improvement : Cache pre-pickled documents #12883

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Performance improvement : Cache pre-pickled documents #12883

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions