Skip to content

Distrust received pack indexes (behind config flag, with perf fixes) #1846

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

tyrielv
Copy link
Contributor

@tyrielv tyrielv commented Jun 23, 2025

This pull request contains 4 things:

  1. Always index locally for prefetch packs #1840 to index packs locally, which was reverted due to unexpected performance issues
  2. Index prefetch packs in parallel #1843 to increase parallelization when indexing packs, which was abandoned due to Always index locally for prefetch packs #1840 being reverted
  3. Added a new configuration "gvfs.trust-pack-indexes", which defaults to true. The new behavior is only used when this is set to false.
    This is intended to be a temporary configuration setting, to be removed (or defaulted to false) once more mitigations have been completed for the performance.
  4. git index-pack is changed from running in the GVFS enlistment to running outside the enlistment. This was the reason for the unexpected performance issues.
    It was expected that the first prefetch on a new clone would take longer due to indexing the pack locally; however users who deleted their prefetch cache (but not the rest of the cache or local loose objects) in order to re-fetch it with local indexing enabled experienced many times longer delays than expected, because git index-pack reads all the existing pack indexes and loose objects and considers them when indexing a pack in order to support validating that all referenced objects exist - even when the command-line options to act on nonexistent references are not enabled. Since we aren't using --validate or its variants, we can run git index-pack outside the enlistment to avoid this issue.

tyrielv added 3 commits June 19, 2025 09:08
The GVFS protocol includes an index file along with pack file in the prefetch
workflow (primarily used on a new clone to fetch all commits and trees).

Currently, GVFS blindly accepts that index file.

This pull request changes GVFS prefetch to discard the index sent by the
server and always create an index locally. This provides improved security
and verification of the pack file, at the expense of performance for the first
clone of a repository on a new drive.
Copy link
Member

@mjcheetham mjcheetham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one question (and one nit pick 😄) about thread count.

/* Git's default thread count is Environment.ProcessorCount / 2, with a maximum of 20.
* Testing shows that we can get a 5% decrease in gvfs clone time for large repositories by using more threads, but
* we won't go over ProcessorCount or 20. */
var threadCount = Math.Min(Environment.ProcessorCount, 20);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any negative impact on clone times for machines that have many fewer available cores compared to threads?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean, if there are cores with more hyperthreading available than the typical 2 threads per core?

In that case it's possible that there is even more room to raise the thread count, but in my experience Environment.ProcessorCount usually is the number of hardware threads rather than hardware cores (eg it will report 16 on an 8-core, 16-thread processor) so this will attempt to use all the hardware threads. The comments around git code for the default number of threads suggest that number of cores is more important than number of threads though, which is why they default to half (ie they assume 2 threads per core) and is probably why I only observed 5% improvement when doubling that. (Most of the overall improvement comes from running the index tasks for smaller pack files in parallel while the big one is still in its single-threaded phase, not from using more threads in the multi-threaded phase)

@mjcheetham mjcheetham requested a review from dscho June 23, 2025 15:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants