-
Notifications
You must be signed in to change notification settings - Fork 101
Rebase to v2.50.0 #763
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: vfs-2.50.0
Are you sure you want to change the base?
Rebase to v2.50.0 #763
Conversation
This is the error:
And it looks like this is a known issue (without resolution as of time of writing). |
400e9a8
to
fc2fed8
Compare
While using the reset --stdin feature on windows path added may have a \r at the end of the path that wasn't getting removed so didn't match the path in the index and wasn't reset. Signed-off-by: Kevin Willford <kewillf@microsoft.com>
It has been a long-standing practice in Git for Windows to append `.windows.<n>`, and in microsoft/git to append `.vfs.0.0`. Let's keep doing that. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Since we really want to be based on a `.vfs.*` tag, let's make sure that there was a new-enough one, i.e. one that agrees with the first three version numbers of the recorded default version. This prevents e.g. v2.22.0.vfs.0.<some-huge-number>.<commit> from being used when the current release train was not yet tagged. It is important to get the first three numbers of the version right because e.g. Scalar makes decisions depending on those (such as assuming that the `git maintenance` built-in is not available, even though it actually _is_ available). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
With this commit, we gather statistics about the sizes of commits, trees, and blobs in the repository, and then present them in the form of "hexbins", i.e. log(16) histograms that show how many objects fall into the 0..15 bytes range, the 16..255 range, the 256..4095 range, etc. For commits, we also show the total count grouped by the number of parents, and for trees we additionally show the total count grouped by number of entries in the form of "qbins", i.e. log(4) histograms. Signed-off-by: Jeff Hostetler <jeffhostetler@github.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This header file will accumulate GVFS-specific definitions. Signed-off-by: Kevin Willford <kewillf@microsoft.com>
Create `struct large_item` and `struct large_item_vec` to capture the n largest commits, trees, and blobs under various scaling dimensions, such as size in bytes, number of commit parents, or number of entries in a tree. Each of these have a command line option to set them independently. Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
This does not do anything yet. The next patches will add various values for that config setting that correspond to the various features offered/required by GVFS. Signed-off-by: Kevin Willford <kewillf@microsoft.com> gvfs: refactor loading the core.gvfs config value This code change makes sure that the config value for core_gvfs is always loaded before checking it. Signed-off-by: Kevin Willford <kewillf@microsoft.com>
Include the pathname of each blob or tree in the large_item_vec to help identify the file or directory associated with the OID and size information. This pathname is computed during the path walk, so it reflects the first observed pathname seen for that OID during the traversal over all of the refs. Since the file or directory could have moved (without being modified), there may be multiple "correct" pathnames for a particular OID. Since we do not control the ref traversal order, we should consider it to be a "suggested pathname" for the OID. Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
This takes a substantial amount of time, and if the user is reasonably sure that the files' integrity is not compromised, that time can be saved. Git no longer verifies the SHA-1 by default, anyway. Signed-off-by: Kevin Willford <kewillf@microsoft.com> Update for 2023-02-27: This feature was upstreamed as the index.skipHash config option. This resulted in some changes to the struct and some of the setup code. In particular, the config reading was moved to prepare_repo_settings(), so the core.gvfs bit check was moved there, too. Signed-off-by: Kevin Willford <kewillf@microsoft.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
Signed-off-by: Kevin Willford <kewillf@microsoft.com>
Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
Prevent the sparse checkout to delete files that were marked with skip-worktree bit and are not in the sparse-checkout file. This is because everything with the skip-worktree bit turned on is being virtualized and will be removed with the change of HEAD. There was only one failing test when running with these changes that was checking to make sure the worktree narrows on checkout which was expected since we would no longer be narrowing the worktree. Update 2022-04-05: temporarily set 'sparse.expectfilesoutsideofpatterns' in test (until we start disabling the "remove present-despite-SKIP_WORKTREE" behavior with 'core.virtualfilesystem' in a later commit). Signed-off-by: Kevin Willford <kewillf@microsoft.com>
Computing `git name-rev` on each commit, tree, and blob in each of the various large_item_vec can be very expensive if there are too many refs, especially if the user doesn't need the result. Lets make it optional. The `--no-name-rev` option can save 50 calls to `git name-rev` since we have 5 large_item_vec's and each defaults to 10 items. Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
This adds hard-coded call to GVFS.hooks.exe before and after each Git command runs. To make sure that this is only called on repositories cloned with GVFS, we test for the tell-tale .gvfs. 2021-10-30: Recent movement of find_hook() to hook.c required moving these changes out of run-command.c to hook.c. Signed-off-by: Ben Peart <Ben.Peart@microsoft.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
While performing a fetch with a virtual file system we know that there will be missing objects and we don't want to download them just because of the reachability of the commits. We also don't want to download a pack file with commits, trees, and blobs since these will be downloaded on demand. This flag will skip the first connectivity check and by returning zero will skip the upload pack. It will also skip the second connectivity check but continue to update the branches to the latest commit ids. Signed-off-by: Kevin Willford <kewillf@microsoft.com>
This backports the `ds/advice-sparse-index-expansion` patches into `microsoft/git` which _just_ missed the v2.46.0 window. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Suggested by Ben Peart. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Ensure all filters and EOL conversions are blocked when running under GVFS so that our projected file sizes will match the actual file size when it is hydrated on the local machine. Signed-off-by: Ben Peart <Ben.Peart@microsoft.com>
Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
Verify that the core.hooksPath configuration is repsected by the pre-command hook. Original regression test was written by Alejandro Pauly. Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
This pull request aims to correct a pretty big issue when dealing with UNINTERESTING objects in the path-walk API. They somehow were only exposed when trying to perform a push from a shallow clone. This will require rewriting the upstream version so this is avoided from the start, but we can do a forward fix for now. The key issue is that the path-walk API was not walking UNINTERESTING trees at the right time, and the way it was being done was more complicated than it needed to be. This changes some of the way the path-walk API works in the presence of UNINTERSTING commits, but these are good changes to make. I had briefly attempted to remove the use of the `edge_aggressive` option in `struct path_walk_info` in favor of using the `--objects-edge-aggressive` option in the revision struct. When I started down that road, though, I somehow got myself into a bind of things not working correctly. I backed out to this version that is working with our test cases. I tested this using the thin and big pack tests in `p5313` which had the same performance as before this change. The new change is that in a shallow clone we can get the same `git push` improvements. I was hung up on testing this for a long time as I wasn't getting the same results in my shallow clone as in my regular clones. It turns out that I had forgotten to use `--no-reuse-delta` in my test command, so it was picking the deltas that were given by the initial clone instead of picking new ones per the algorithm. 🤦🏻
Introduce a new maintenance task, `cache-local-objects`, that operates on Scalar or VFS for Git repositories with a per-volume, shared object cache (specified by `gvfs.sharedCache`) to migrate packfiles and loose objects from the repository object directory to the shared cache. Older versions of `microsoft/git` incorrectly placed packfiles in the repository object directory instead of the shared cache; this task will help clean up existing clones impacted by that issue. Migration of packfiles involves the following steps for each pack: 1. Hardlink (or copy): a. the .pack file b. the .keep file c. the .rev file 2. Move (or copy + delete) the .idx file 3. Delete/unlink: a. the .pack file b. the .keep file c. the .rev file Moving the index file after the others ensures the pack is not read from the new cache directory until all associated files (rev, keep) exist in the cache directory also. Moving loose objects operates as a move, or copy + delete. Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
The --path-walk option in `git pack-objects` is implied by the pack.usePathWalk=true config value. This is intended to help the packfile generation within `git push` specifically. While this config does enable the path-walk feature, it does not lead to the expected levels of compression in the cases it was designed to handle. This is due to the default implication of the --reuse-delta option as well as auto-GC. In the performance tests used to evaluate the --path-walk option, such as those in p5313, the --no-reuse-delta option is used to ensure that deltas are recomputed according to the new object walk. However, it was assumed (I assumed this) that when the objects were loose from client-side operations that better deltas would be computed during this operation. This wasn't confirmed because the test process used data that was fetched from real repositories and thus existed in packed form only. I was able to confirm that this does not reproduce when the objects to push are loose. Careful use of making the pushed commit unreachable and loosening the objects via `git repack -Ad` helps to confirm my suspicions here. Independent of this change, I'm pushing for these pipeline agents to set `gc.auto=0` before creating their Git objects. In the current setup, the repo is adding objects and then incrementally repacking them and ending up with bad cross-path deltas. This approach can help scenarios where that makes sense, but will not cover all of our users without them choosing to opt-in to background maintenance (and even then, an incremental repack could cost them efficiency). In order to make sure we are getting the intended compression in `git push`, this change enforces the spawned `git pack-objects` process to use `--no-reuse-delta`. As far as I can tell, the main motivation for implying the --reuse-delta option by default is two-fold: 1. The code in send-pack.c that executes 'git pack-objects' is ignorant of whether the current process is a client pushing to a remote or a remote sending a fetch or clone to a client. 2. For servers, it is critical that they trust the previously computed deltas whenever possible, or they could overload their CPU resources. There's also the side that most servers use repacking logic that will replace any bad deltas that are sent by clients (or at least, that's the hope; we've seen that repacks can also pick bad deltas). This commit also adds a test case that demonstrates that `git -c pack.usePathWalk=true push` now avoids reusing deltas. To do this, the test case constructs a pack with a horrendously inefficient delta object, then verifies that the pack on the receiving side of the `push` fails to have such an inefficient delta. The test case would probably be a lot more readable if hex numbers were used instead of octal numbers, but alas, `printf "\x<hex>"` is not portable, only `printf "\<octal>"` is. For example, dash's built-in `printf` function simply prints `\x` verbatim while bash's built-in happily converts this construct to the corresponding byte. Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Git v2.48.0 has become even more stringent about leaks. Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Add the `cache-local-objects` maintenance task to the list of tasks run by the `scalar run` command. It's often easier for users to run the shorter `scalar run` command than the equivalent `git maintenance` command. Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
The --path-walk option in 'git pack-objects' is implied by the pack.usePathWalk=true config value. This is intended to help the packfile generation within 'git push' specifically. While this config does enable the path-walk feature, it does not lead the expected levels of compression in the cases it was designed to handle. This is due to the default implication of the --reuse-delta option as well as auto-GC. In the performance tests used to evaluate the --path-walk option, such as those in p5313, the --no-reuse-delta option is used to ensure that deltas are recomputed according to the new object walk. However, it was assumed (I assumed this) that when the objects were loose from client-side operations that better deltas would be computed during this operation. This wasn't confirmed because the test process used data that was fetched from real repositories and thus existed in packed form only. I was able to confirm that this does not reproduce when the objects to push are loose. Careful use of making the pushed commit unreachable and loosening the objects via 'git repack -Ad' helps to confirm my suspicions here. Independent of this change, I'm pushing for these pipeline agents to set 'gc.auto=0' before creating their Git objects. In the current setup, the repo is adding objects and then incrementally repacking them and ending up with bad cross-path deltas. This approach can help scenarios where that makes sense, but will not cover all of our users without them choosing to opt-in to background maintenance (and even then, an incremental repack could cost them efficiency). In order to make sure we are getting the intended compression in 'git push', this change makes the --path-walk option imply --no-reuse-delta when the --reuse-delta option is not provided. As far as I can tell, the main motivation for implying the --reuse-delta option by default is two-fold: 1. The code in send-pack.c that executes 'git pack-objects' is ignorant of whether the current process is a client pushing to a remote or a remote sending a fetch or clone to a client. 2. For servers, it is critical that they trust the previously computed deltas whenever possible, or they could overload their CPU resources. There's also the side that most servers use repacking logic that will replace any bad deltas that are sent by clients (or at least, that's the hope; we've seen that repacks can also pick bad deltas). The --path-walk option at the moment is not compatible with reachability bitmaps, so is not planned to be used by Git servers. Thus, we can reasonably assume (for now) that the --path-walk option is assuming a client-side scenario, either a push or a repack. The repack option will be explicit about the --reuse-delta option or not. One thing to be careful about is background maintenance, which uses a list of objects instead of refs, so we condition this on the case where the --path-walk option will be effective by checking that the --revs option was provided. Alternative options considered included: * Adding _another_ config ('pack.reuseDelta=false') to opt-in to this choice. However, we already have pack.usePathWalk=true as an opt-in to "do the right thing to make my data small" as far as our internal users are concerned. * Modify the chain between builtin/push.c, transport.c, and builtin/send-pack.c to communicate that we are in "push" mode, not within a fetch or clone. However, this seemed like overkill. It may be beneficial in the future to pass through a mode like this, but it does not meet the bar for the immediate need. Reviewers, please see git-for-windows#5171 for the baseline implementation of this feature within Git for Windows and thus microsoft/git. This feature is still under review upstream.
Introduce a new maintenance task, `cache-local-objects`, that operates on Scalar or VFS for Git repositories with a per-volume, shared object cache (specified by `gvfs.sharedCache`) to migrate packfiles and loose objects from the repository object directory to the shared cache. Older versions of `microsoft/git` incorrectly placed packfiles in the repository object directory instead of the shared cache; this task will help clean up existing clones impacted by that issue. Fixes #716
Add the ability to block built-in commands based on if the `core.gvfs` setting has the `GVFS_USE_VIRTUAL_FILESYSTEM` bit set. This allows us to selectively block commands that use the GVFS protocol, but don't use VFS for Git (for example repos cloned via `scalar clone` against Azure DevOps). Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Loosen the blocking of the `repack` command from all "GVFS repos" (those that have `core.gvfs` set) to only those that actually use the virtual file system (VFS for Git only). This allows for `repack` to be used in Scalar clones. Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Loosen the blocking of the `fsck` command from all "GVFS repos" (those that have `core.gvfs` set) to only those that actually use the virtual file system (VFS for Git only). This allows for `fsck` to be used in Scalar clones. Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Loosen the blocking of the `prune` command from all "GVFS repos" (those that have `core.gvfs` set) to only those that actually use the virtual file system (VFS for Git only). This allows for `prune` to be used in Scalar clones. Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
The microsoft/git fork includes pre- and post-command hooks, with the initial intention of using these for VFS for Git. In that environment, these are important hooks to avoid concurrent issues when the virtualization is incomplete. However, in the Office monorepo the post-command hook is used in a different way. A custom hook is used to update the sparse-checkout, if necessary. To avoid this hook from being incredibly slow on every Git command, this hook checks for the existence of a "sentinel file" that is written by a custom post-index-change hook and no-ops if that file does not exist. However, even this "no-op" is 200ms due to the use of two scripts (one simple script in .git/hooks/ does some environment checking and then calls a script from the working directory which actually contains the logic). Add a new config option, 'postCommand.strategy', that will allow for multiple possible strategies in the future. For now, the one we are adding is 'worktree-change' which states that we should write a sentinel file instead of running the 'post-index-change' hook and then skip the 'post-command' hook if the proper sentinel file doesn't exist. If it does exist, then delete it and run the hook. This behavior is _only_ triggered, however, if a part of the index changes that is within the sparse checkout; If only parts of the index change that are not even checked out on disk, the hook is still skipped. I originally planned to put this into the repo-settings, but this caused the repo settings to load in more cases than they did previously. When there is an invalid boolean config option, this causes failure in new places. This was caught by t3007. This behavior is tested in t0401-post-command-hook.sh. Signed-off-by: Derrick Stolee <stolee@gmail.com>
Replace the special casing of the `worktree` command being blocked on VFS-enabled repos with the new `BLOCK_ON_VFS_ENABLED` flag. Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Emit a warning message when the `gvfs.sharedCache` option is set that the `repack` command will not perform repacking on the shared cache. In the future we can teach `repack` to operate on the shared cache, at which point we can drop this commit. Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
This helps t0401 pass while under SANITIZE=leak. Signed-off-by: Derrick Stolee <stolee@gmail.com>
Currently when the `core.gvfs` setting is set, several commands are outright blocked from running. Some of these commands, namely `repack` are actually OK to run in a Scalar clone, even if it uses the GVFS protocol (for Azure DevOps). Introduce a different blocking mechanism to only block commands when the virtual filesystem is being used, rather than as a broad block on any `core.gvfs` setting.
This new test demonstrates some behavior where a valid packfile is being rejected by the Git client due to the order in which it is resolving REF_DELTAs. The thin packfile has a REF_DELTA chain A->B->C where C is not included in the packfile. However, the client repository contains both C and B already. Thus, 'git index-pack' is able to resolve A before resolving B. When resolving B, it then attempts to resolve any other REF_DELTAs that are pointing to B as a base. This "revisits" A and complains as if there is a cycle, but it did not actually detect a cycle. A fix will arrive in the next change. Signed-off-by: Derrick Stolee <stolee@gmail.com>
The microsoft/git fork includes pre- and post-command hooks, with the initial intention of using these for VFS for Git. In that environment, these are important hooks to avoid concurrent issues when the virtualization is incomplete. However, in the Office monorepo the post-command hook is used in a different way. A custom hook is used to update the sparse-checkout, if necessary. To avoid this hook from being incredibly slow on every Git command, this hook checks for the existence of a "sentinel file" that is written by a custom post-index-change hook and no-ops if that file does not exist. However, even this "no-op" is 200ms due to the use of two scripts (one simple script in .git/hooks/ does some environment checking and then calls a script from the working directory which actually contains the logic). Add a new config option, 'postCommand.strategy', that will allow for multiple possible strategies in the future. For now, the one we are adding is 'post-index-change' which states that we should write a sentinel file instead of running the 'post-index-change' hook and then skip the 'post-command' hook if the proper sentinel file doesn't exist. (If it does exist, then delete it and run the hook.) --- This fork contains changes specific to monorepo scenarios. If you are an external contributor, then please detail your reason for submitting to this fork: * [ ] This is an early version of work already under review upstream. * [ ] This change only applies to interactions with Azure DevOps and the GVFS Protocol. * [ ] This change only applies to the virtualization hook and VFS for Git. * [x] This change only applies to custom bits in the microsoft/git fork.
Just like we just did in the backport from my upstream contribution, let's convert the `curl_easy_setopt()` calls in `gvfs-helper.c` that still passed `int` constants to pass `long` instead. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This is an early version of work already under review upstream: gitgitgadget#1906
On Linux, the following command would cause the terminal to be stuck waiting: git fetch origin foobar The issue would be that the fetch would fail with the error fatal: couldn't find remote ref foobar but the underlying git-gvfs-helper process wouldn't die. The `subprocess_exit_handler()` method would close its stdin and stdout, but that wouldn't be enough to cause the process to end, even though the `packet_read_line_gently()` call that is run in `do_sub_cmd__server()` in a loop should return -1 and the process should the terminate peacefully. While it is unclear why this does not happen, there may be other conditions where the `gvfs-helper` process would not terminate. Let's ensure that the gvfs-helper-client process cleans up the gvfs-helper server processes that it spawned upon exit. Reported-by: Stuart Wilcox Humilde <stuartwilcox@microsoft.com> Co-authored-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This topic branch has backports of cURL compile fixes in the `osx-gcc` job, plus a bonus `gvfs-helper` follow-up fix. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
On Linux, the following command would cause the terminal to be stuck waiting: ``` git fetch origin foobar ``` The issue would be that the fetch would fail with the error ``` fatal: couldn't find remote ref foobar ``` but the underlying `git-gvfs-helper` process wouldn't die. The `subprocess_exit_handler()` method would close its stdin and stdout, but that wouldn't be enough to cause the process to end. This PR addresses that by skipping the `finish_command()` call of the `clean_on_exit_handler` and instead lets `cleanup_children()` send a SIGTERM to terminate those spawned child processes.
fc2fed8
to
d836430
Compare
git range-diff --creation-factor=95 v2.49.0.windows.1..microsoft/clean/vfs-2.49.0 v2.50.0.windows.1..
This is admittedly a large range-diff... Part of the reason is that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a look at the range-diff. I appreciate all the work you did, @dscho, navigating the upstream refactors. I'm particularly looking forward to seeing this release as Office monorepo users will benefit from the new sparse index integration with git add -p
.
This rebases
microsoft/git
's branch thicket on top of Git for Windows v2.50.0. Please find the range-diff below.