Skip to content

Index prefetch packs in parallel #1843

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

tyrielv
Copy link
Contributor

@tyrielv tyrielv commented Jun 16, 2025

#1840 required local indexing of pack files (an expensive task for large repositories) to gvfs clone. This pull request improves the performance of pack file indexing by:

  1. Launching git index-pack instances asynchronously during prefetch, and
  2. Specifying a number of threads equal to processor count to git index-pack, overriding the git default of half the processor count.

The prefetch API will usually return one large pack file comprising most of the commits and trees, followed by several smaller pack files representing the new commits and trees since the last maintenance job consolidated the main pack file.

git index-pack begins with a single-threaded task to index the objects in the pack file, followed by a multi-threaded task to resolve the deltas. With these changes, the smaller incremental pack files are typically indexed in full while the primary pack file is still in its single-threaded phase, and the primary pack file's multi-threaded phase is marginally faster.

In testing, these changes reduced clone time for a large repository by 15-30%.

@mjcheetham mjcheetham requested review from dscho and mjcheetham June 17, 2025 10:41
@tyrielv tyrielv closed this Jun 18, 2025
@tyrielv
Copy link
Contributor Author

tyrielv commented Jun 18, 2025

Closing due to #1844

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant