Skip to content

Permit repack command in Scalar clones #732

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Mar 31, 2025

Conversation

mjcheetham
Copy link
Member

Currently when the core.gvfs setting is set, several commands are outright blocked from running. Some of these commands, namely repack are actually OK to run in a Scalar clone, even if it uses the GVFS protocol (for Azure DevOps).

Introduce a different blocking mechanism to only block commands when the virtual filesystem is being used, rather than as a broad block on any core.gvfs setting.

dscho
dscho previously approved these changes Mar 6, 2025
Copy link
Member

@dscho dscho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@dscho
Copy link
Member

dscho commented Mar 6, 2025

For the record: I believe that the original block to prevent git repack upon core.gvfs is rooted in the fact that VFS for Git deceives Git to consider all Git objects to be present locally, which would have caused, say, git repack -adf to download turn any VFS for Git-enabled clone into a full clone (which is obviously undesirable).

Scalar, on the other hand, initializes the clone as a partial clone, where Git is fully aware that it lacks some Git objects locally and does not try to fetch them during a git repack. Hence, that block makes little sense in Scalar-enabled clones and this PR is the right thing to do. @derrickstolee as the smart brain behind Scalar, you are much more familiar with this space than I am, do you agree, or did I make a mental mistake in this reasoning?

Copy link

@derrickstolee derrickstolee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to make sense, but also it may be good to add a warning in git repack when gvfs.sharedCache is enabled that it won't shrink the shared cache. What instructions should we add to do that full repack? Typically, I tell people to try git maintenance run --task=incremental-repack as the maintenance builtin knows how to adjust for the alternate and that task doesn't care about refs. git repack with GIT_OBJECT_DIRECTORY=<cache> may struggle if there are local objects in the .git/objects dir that have not migrated to the cache.

@@ -724,7 +724,7 @@ static struct cmd_struct commands[] = {
{ "verify-tag", cmd_verify_tag, RUN_SETUP },
{ "version", cmd_version },
{ "whatchanged", cmd_whatchanged, RUN_SETUP },
{ "worktree", cmd_worktree, RUN_SETUP },
{ "worktree", cmd_worktree, RUN_SETUP | BLOCK_ON_VFS_ENABLED },

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a much simpler way to do this!

test_must_fail git repack
'

test_expect_success 'test repack succeeds with VFS bit disabled' '

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a test with a Scalar clone that has a gvfs.sharedCache enabled? I'm curious as to whether git repack will operate on that object directory (my guess is not because its an alternate which Git treats as read-only).

It would be good to be up front about "git repack is no longer blocked, but it also won't do what you expect".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a test with a Scalar clone that has a gvfs.sharedCache enabled? I'm curious as to whether git repack will operate on that object directory (my guess is not because its an alternate which Git treats as read-only).

We do have this in git maintenance run:

git/builtin/gc.c

Lines 1822 to 1831 in 5ad6dd6

/*
* To enable the VFS for Git/Scalar shared object cache, use
* the gvfs.sharedcache config option to redirect the
* maintenance to that location.
*/
if (!git_config_get_value("gvfs.sharedcache", &tmp_obj_dir) &&
tmp_obj_dir) {
shared_object_dir = xstrdup(tmp_obj_dir);
setenv(DB_ENVIRONMENT, shared_object_dir, 1);
}

I think this is part of what is verified in

git/t/t7900-maintenance.sh

Lines 1126 to 1155 in 5ad6dd6

test_expect_success 'cache-local-objects task success' '
test_when_finished "rm -rf repo cache" &&
mkdir -p cache/pack &&
git init repo &&
(
cd repo &&
test_commit something &&
git config set gvfs.sharedcache ../cache &&
git config set maintenance.gc.enabled false &&
git config set maintenance.cache-local-objects.enabled true &&
git config set maintenance.cache-local-objects.auto 1 &&
test_import_packfile &&
test_get_packdir_files "*.pack" "*.idx" "*.keep" "*.rev" \
>src.txt &&
test_get_loose_object_files >>src.txt &&
sed "s/.git\\/objects\\//..\\/cache\\//" src.txt >dst.txt &&
git maintenance run &&
while IFS= read -r f; do
test_path_is_missing $f || exit 1
done <src.txt &&
while IFS= read -r f; do
test_path_exists $f || exit 1
done <dst.txt
)
'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache-local-objects maintenance task migrates local loose objects and packfiles to the shared cache. Also the incremental-repack task is redirected to the cache directory, so together the maintenance tasks will repack the packs in the cache dir.

The repack built-in however will only act on the local objects. Would it be worth either:
a) reach repack to redirect to the shared object dir instead (if set, like the incremental-repack task), or
b) print the warning, perhaps recommending a run of git maintenance run --task=incremental-repack instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious as to whether git repack will operate on that object directory (my guess is not because its an alternate which Git treats as read-only).

There is an example of a similar scenario in repack already, whereby if an alternate has been configured the bitmaps are not written since "some objects are not being packed".

git/builtin/repack.c

Lines 1292 to 1302 in 5ad6dd6

if (write_bitmaps && po_args.local && has_alt_odb(the_repository)) {
/*
* When asked to do a local repack, but we have
* packfiles that are inherited from an alternate, then
* we cannot guarantee that the multi-pack-index would
* have full coverage of all objects. We thus disable
* writing bitmaps in that case.
*/
warning(_("disabling bitmap writing, as some objects are not being packed"));
write_bitmaps = 0;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. If I put myself into a user's shoes, I would much rather prefer git repack to take care of the shared object directory (and potentially migrate objects there as part of the repack) if I work on a gvfs.sharedCache-enabled repository.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this? (for option b)

diff --git a/builtin/repack.c b/builtin/repack.c
index 01db1b8f529..5a44171f8e8 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -24,6 +24,7 @@
 #include "pack-bitmap.h"
 #include "refs.h"
 #include "list-objects-filter-options.h"
+#include "gvfs.h"
 
 #define ALL_INTO_ONE 1
 #define LOOSEN_UNREACHABLE 2
@@ -1177,6 +1178,7 @@ int cmd_repack(int argc,
 	struct tempfile *refs_snapshot = NULL;
 	int i, ext, ret;
 	int show_progress;
+	const char *tmp_obj_dir = NULL;
 
 	/* variables to be filled by option parsing */
 	int delete_redundant = 0;
@@ -1301,6 +1303,10 @@ int cmd_repack(int argc,
 		write_bitmaps = 0;
 	}
 
+	if (gvfs_config_is_set(NULL) &&
+	    !git_config_get_value("gvfs.sharedcache", &tmp_obj_dir))
+		warning(_("shared object cache will not be repacked; use `git maintenance run --task=incremental-repack` instead"));
+
 	if (write_midx && write_bitmaps) {
 		struct strbuf path = STRBUF_INIT;
 

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be the option with the warning. However, as a user I'd much prefer the command to not warn me about something I probably don't want, but to do what I probably want instead, i.e. the other option you suggested ;-)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think users who are bothering to run git repack themselves really want their object data to be reworked. I'm thinking specifically in the cases like git repack -adf that I use to rewrite all objects into a single packfile. We have historically avoided doing that on the shared object cache, but it should work somewhat well due to the integration with partial clone machinery. It's just going to be slow and users should expect that. We're not choosing to run it in the background because of that cost.

@mjcheetham
Copy link
Member Author

Scalar, on the other hand, initializes the clone as a partial clone, where Git is fully aware that it lacks some Git objects locally and does not try to fetch them during a git repack.

I guess we could also permit the fsck and prune commands in Scalar clones too, given that these are partial clone aware? cc @derrickstolee

dscho
dscho previously approved these changes Mar 21, 2025
Copy link
Member

@dscho dscho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this should not be retargeted to vfs-2.49.0, right? The changes look correct to me.

Copy link

@derrickstolee derrickstolee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also prepared to approve this when it is retargeted. We can consider extensions to let folks repack their shared object cache as a follow-up.

Add the ability to block built-in commands based on if the `core.gvfs`
setting has the `GVFS_USE_VIRTUAL_FILESYSTEM` bit set. This allows us
to selectively block commands that use the GVFS protocol, but don't use
VFS for Git (for example repos cloned via `scalar clone` against Azure
DevOps).

Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Loosen the blocking of the `repack` command from all "GVFS repos" (those
that have `core.gvfs` set) to only those that actually use the virtual
file system (VFS for Git only). This allows for `repack` to be used in
Scalar clones.

Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Loosen the blocking of the `fsck` command from all "GVFS repos" (those
that have `core.gvfs` set) to only those that actually use the virtual
file system (VFS for Git only). This allows for `fsck` to be used in
Scalar clones.

Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Loosen the blocking of the `prune` command from all "GVFS repos" (those
that have `core.gvfs` set) to only those that actually use the virtual
file system (VFS for Git only). This allows for `prune` to be used in
Scalar clones.

Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Replace the special casing of the `worktree` command being blocked on
VFS-enabled repos with the new `BLOCK_ON_VFS_ENABLED` flag.

Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
@mjcheetham mjcheetham changed the base branch from vfs-2.48.1 to vfs-2.49.0 March 26, 2025 10:33
@mjcheetham mjcheetham dismissed dscho’s stale review March 26, 2025 10:33

The base branch was changed.

@mjcheetham
Copy link
Member Author

Changes

  • Retargeted to vfs-2.49.0
  • Unblock prune and fsck on Scalar repos
  • Emit a warning when gvfs.sharedCache is set during repack

dscho
dscho previously approved these changes Mar 26, 2025
Copy link
Member

@dscho dscho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

After looking at the shared cache repacking idea briefly, I agree that it will be much safer to leave this for a separate PR.

@@ -1301,6 +1303,10 @@ int cmd_repack(int argc,
write_bitmaps = 0;
}

if (gvfs_config_is_set(NULL) &&
!git_config_get_value("gvfs.sharedcache", &tmp_obj_dir))
warning(_("shared object cache is configured but will not be repacked"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's turn this warning into in informational message that the shared cache is repacked instead (and then do that) in a follow-up PR (and add a test to verify that it worked as intended). Unfortunately, it won't be as easy as:

-		warning(_("shared object cache is configured but will not be repacked"));
+		setenv(DB_ENVIRONMENT, tmp_obj_dir, 1);

because subsequent lines call repo_get_object_repository() (this line, this one and this one), and that function merely returns the already-initialized repo->objects->odb->path.

Other than that, the pack-objects command is spawned which also would need to know about this, and the cruft pack/delete redundant part probably also would need to be updated.

Most importantly, though, we really need to ensure that no objects are deleted that are reachable in any Scalar repository that shares that sharedcache repository, even if the objects should not be reachable in that sharedcache repository itself...

derrickstolee
derrickstolee previously approved these changes Mar 26, 2025
Copy link

@derrickstolee derrickstolee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for starting this effort!

Emit a warning message when the `gvfs.sharedCache` option is set that
the `repack` command will not perform repacking on the shared cache.

In the future we can teach `repack` to operate on the shared cache, at
which point we can drop this commit.

Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
@mjcheetham mjcheetham merged commit 1915438 into microsoft:vfs-2.49.0 Mar 31, 2025
63 checks passed
@mjcheetham mjcheetham deleted the vfs-block branch March 31, 2025 14:24
dscho pushed a commit that referenced this pull request Jun 6, 2025
Currently when the `core.gvfs` setting is set, several commands are
outright blocked from running. Some of these commands, namely `repack`
are actually OK to run in a Scalar clone, even if it uses the GVFS
protocol (for Azure DevOps).

Introduce a different blocking mechanism to only block commands when the
virtual filesystem is being used, rather than as a broad block on any
`core.gvfs` setting.
dscho pushed a commit that referenced this pull request Jun 6, 2025
Currently when the `core.gvfs` setting is set, several commands are
outright blocked from running. Some of these commands, namely `repack`
are actually OK to run in a Scalar clone, even if it uses the GVFS
protocol (for Azure DevOps).

Introduce a different blocking mechanism to only block commands when the
virtual filesystem is being used, rather than as a broad block on any
`core.gvfs` setting.
dscho pushed a commit that referenced this pull request Jun 11, 2025
Currently when the `core.gvfs` setting is set, several commands are
outright blocked from running. Some of these commands, namely `repack`
are actually OK to run in a Scalar clone, even if it uses the GVFS
protocol (for Azure DevOps).

Introduce a different blocking mechanism to only block commands when the
virtual filesystem is being used, rather than as a broad block on any
`core.gvfs` setting.
dscho pushed a commit that referenced this pull request Jun 13, 2025
Currently when the `core.gvfs` setting is set, several commands are
outright blocked from running. Some of these commands, namely `repack`
are actually OK to run in a Scalar clone, even if it uses the GVFS
protocol (for Azure DevOps).

Introduce a different blocking mechanism to only block commands when the
virtual filesystem is being used, rather than as a broad block on any
`core.gvfs` setting.
dscho pushed a commit that referenced this pull request Jun 16, 2025
Currently when the `core.gvfs` setting is set, several commands are
outright blocked from running. Some of these commands, namely `repack`
are actually OK to run in a Scalar clone, even if it uses the GVFS
protocol (for Azure DevOps).

Introduce a different blocking mechanism to only block commands when the
virtual filesystem is being used, rather than as a broad block on any
`core.gvfs` setting.
dscho pushed a commit that referenced this pull request Jun 16, 2025
Currently when the `core.gvfs` setting is set, several commands are
outright blocked from running. Some of these commands, namely `repack`
are actually OK to run in a Scalar clone, even if it uses the GVFS
protocol (for Azure DevOps).

Introduce a different blocking mechanism to only block commands when the
virtual filesystem is being used, rather than as a broad block on any
`core.gvfs` setting.
dscho pushed a commit that referenced this pull request Jun 16, 2025
Currently when the `core.gvfs` setting is set, several commands are
outright blocked from running. Some of these commands, namely `repack`
are actually OK to run in a Scalar clone, even if it uses the GVFS
protocol (for Azure DevOps).

Introduce a different blocking mechanism to only block commands when the
virtual filesystem is being used, rather than as a broad block on any
`core.gvfs` setting.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants