-
Notifications
You must be signed in to change notification settings - Fork 101
Permit repack
command in Scalar clones
#732
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
For the record: I believe that the original block to prevent Scalar, on the other hand, initializes the clone as a partial clone, where Git is fully aware that it lacks some Git objects locally and does not try to fetch them during a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to make sense, but also it may be good to add a warning in git repack
when gvfs.sharedCache
is enabled that it won't shrink the shared cache. What instructions should we add to do that full repack? Typically, I tell people to try git maintenance run --task=incremental-repack
as the maintenance builtin knows how to adjust for the alternate and that task doesn't care about refs. git repack
with GIT_OBJECT_DIRECTORY=<cache>
may struggle if there are local objects in the .git/objects
dir that have not migrated to the cache.
@@ -724,7 +724,7 @@ static struct cmd_struct commands[] = { | |||
{ "verify-tag", cmd_verify_tag, RUN_SETUP }, | |||
{ "version", cmd_version }, | |||
{ "whatchanged", cmd_whatchanged, RUN_SETUP }, | |||
{ "worktree", cmd_worktree, RUN_SETUP }, | |||
{ "worktree", cmd_worktree, RUN_SETUP | BLOCK_ON_VFS_ENABLED }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a much simpler way to do this!
test_must_fail git repack | ||
' | ||
|
||
test_expect_success 'test repack succeeds with VFS bit disabled' ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a test with a Scalar clone that has a gvfs.sharedCache
enabled? I'm curious as to whether git repack
will operate on that object directory (my guess is not because its an alternate which Git treats as read-only).
It would be good to be up front about "git repack
is no longer blocked, but it also won't do what you expect".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a test with a Scalar clone that has a
gvfs.sharedCache
enabled? I'm curious as to whethergit repack
will operate on that object directory (my guess is not because its an alternate which Git treats as read-only).
We do have this in git maintenance run
:
Lines 1822 to 1831 in 5ad6dd6
/* | |
* To enable the VFS for Git/Scalar shared object cache, use | |
* the gvfs.sharedcache config option to redirect the | |
* maintenance to that location. | |
*/ | |
if (!git_config_get_value("gvfs.sharedcache", &tmp_obj_dir) && | |
tmp_obj_dir) { | |
shared_object_dir = xstrdup(tmp_obj_dir); | |
setenv(DB_ENVIRONMENT, shared_object_dir, 1); | |
} |
I think this is part of what is verified in
Lines 1126 to 1155 in 5ad6dd6
test_expect_success 'cache-local-objects task success' ' | |
test_when_finished "rm -rf repo cache" && | |
mkdir -p cache/pack && | |
git init repo && | |
( | |
cd repo && | |
test_commit something && | |
git config set gvfs.sharedcache ../cache && | |
git config set maintenance.gc.enabled false && | |
git config set maintenance.cache-local-objects.enabled true && | |
git config set maintenance.cache-local-objects.auto 1 && | |
test_import_packfile && | |
test_get_packdir_files "*.pack" "*.idx" "*.keep" "*.rev" \ | |
>src.txt && | |
test_get_loose_object_files >>src.txt && | |
sed "s/.git\\/objects\\//..\\/cache\\//" src.txt >dst.txt && | |
git maintenance run && | |
while IFS= read -r f; do | |
test_path_is_missing $f || exit 1 | |
done <src.txt && | |
while IFS= read -r f; do | |
test_path_exists $f || exit 1 | |
done <dst.txt | |
) | |
' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cache-local-objects
maintenance task migrates local loose objects and packfiles to the shared cache. Also the incremental-repack
task is redirected to the cache directory, so together the maintenance tasks will repack the packs in the cache dir.
The repack
built-in however will only act on the local objects. Would it be worth either:
a) reach repack
to redirect to the shared object dir instead (if set, like the incremental-repack
task), or
b) print the warning, perhaps recommending a run of git maintenance run --task=incremental-repack
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious as to whether
git repack
will operate on that object directory (my guess is not because its an alternate which Git treats as read-only).
There is an example of a similar scenario in repack
already, whereby if an alternate has been configured the bitmaps are not written since "some objects are not being packed".
Lines 1292 to 1302 in 5ad6dd6
if (write_bitmaps && po_args.local && has_alt_odb(the_repository)) { | |
/* | |
* When asked to do a local repack, but we have | |
* packfiles that are inherited from an alternate, then | |
* we cannot guarantee that the multi-pack-index would | |
* have full coverage of all objects. We thus disable | |
* writing bitmaps in that case. | |
*/ | |
warning(_("disabling bitmap writing, as some objects are not being packed")); | |
write_bitmaps = 0; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. If I put myself into a user's shoes, I would much rather prefer git repack
to take care of the shared object directory (and potentially migrate objects there as part of the repack
) if I work on a gvfs.sharedCache
-enabled repository.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like this? (for option b)
diff --git a/builtin/repack.c b/builtin/repack.c
index 01db1b8f529..5a44171f8e8 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -24,6 +24,7 @@
#include "pack-bitmap.h"
#include "refs.h"
#include "list-objects-filter-options.h"
+#include "gvfs.h"
#define ALL_INTO_ONE 1
#define LOOSEN_UNREACHABLE 2
@@ -1177,6 +1178,7 @@ int cmd_repack(int argc,
struct tempfile *refs_snapshot = NULL;
int i, ext, ret;
int show_progress;
+ const char *tmp_obj_dir = NULL;
/* variables to be filled by option parsing */
int delete_redundant = 0;
@@ -1301,6 +1303,10 @@ int cmd_repack(int argc,
write_bitmaps = 0;
}
+ if (gvfs_config_is_set(NULL) &&
+ !git_config_get_value("gvfs.sharedcache", &tmp_obj_dir))
+ warning(_("shared object cache will not be repacked; use `git maintenance run --task=incremental-repack` instead"));
+
if (write_midx && write_bitmaps) {
struct strbuf path = STRBUF_INIT;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be the option with the warning. However, as a user I'd much prefer the command to not warn me about something I probably don't want, but to do what I probably want instead, i.e. the other option you suggested ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think users who are bothering to run git repack
themselves really want their object data to be reworked. I'm thinking specifically in the cases like git repack -adf
that I use to rewrite all objects into a single packfile. We have historically avoided doing that on the shared object cache, but it should work somewhat well due to the integration with partial clone machinery. It's just going to be slow and users should expect that. We're not choosing to run it in the background because of that cost.
I guess we could also permit the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this should not be retargeted to vfs-2.49.0
, right? The changes look correct to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also prepared to approve this when it is retargeted. We can consider extensions to let folks repack their shared object cache as a follow-up.
Add the ability to block built-in commands based on if the `core.gvfs` setting has the `GVFS_USE_VIRTUAL_FILESYSTEM` bit set. This allows us to selectively block commands that use the GVFS protocol, but don't use VFS for Git (for example repos cloned via `scalar clone` against Azure DevOps). Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Loosen the blocking of the `repack` command from all "GVFS repos" (those that have `core.gvfs` set) to only those that actually use the virtual file system (VFS for Git only). This allows for `repack` to be used in Scalar clones. Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Loosen the blocking of the `fsck` command from all "GVFS repos" (those that have `core.gvfs` set) to only those that actually use the virtual file system (VFS for Git only). This allows for `fsck` to be used in Scalar clones. Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Loosen the blocking of the `prune` command from all "GVFS repos" (those that have `core.gvfs` set) to only those that actually use the virtual file system (VFS for Git only). This allows for `prune` to be used in Scalar clones. Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Replace the special casing of the `worktree` command being blocked on VFS-enabled repos with the new `BLOCK_ON_VFS_ENABLED` flag. Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Changes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
After looking at the shared cache repacking idea briefly, I agree that it will be much safer to leave this for a separate PR.
@@ -1301,6 +1303,10 @@ int cmd_repack(int argc, | |||
write_bitmaps = 0; | |||
} | |||
|
|||
if (gvfs_config_is_set(NULL) && | |||
!git_config_get_value("gvfs.sharedcache", &tmp_obj_dir)) | |||
warning(_("shared object cache is configured but will not be repacked")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's turn this warning into in informational message that the shared cache is repacked instead (and then do that) in a follow-up PR (and add a test to verify that it worked as intended). Unfortunately, it won't be as easy as:
- warning(_("shared object cache is configured but will not be repacked"));
+ setenv(DB_ENVIRONMENT, tmp_obj_dir, 1);
because subsequent lines call repo_get_object_repository()
(this line, this one and this one), and that function merely returns the already-initialized repo->objects->odb->path
.
Other than that, the pack-objects
command is spawned which also would need to know about this, and the cruft pack/delete redundant part probably also would need to be updated.
Most importantly, though, we really need to ensure that no objects are deleted that are reachable in any Scalar repository that shares that sharedcache
repository, even if the objects should not be reachable in that sharedcache
repository itself...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for starting this effort!
Emit a warning message when the `gvfs.sharedCache` option is set that the `repack` command will not perform repacking on the shared cache. In the future we can teach `repack` to operate on the shared cache, at which point we can drop this commit. Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
Currently when the `core.gvfs` setting is set, several commands are outright blocked from running. Some of these commands, namely `repack` are actually OK to run in a Scalar clone, even if it uses the GVFS protocol (for Azure DevOps). Introduce a different blocking mechanism to only block commands when the virtual filesystem is being used, rather than as a broad block on any `core.gvfs` setting.
Currently when the `core.gvfs` setting is set, several commands are outright blocked from running. Some of these commands, namely `repack` are actually OK to run in a Scalar clone, even if it uses the GVFS protocol (for Azure DevOps). Introduce a different blocking mechanism to only block commands when the virtual filesystem is being used, rather than as a broad block on any `core.gvfs` setting.
Currently when the `core.gvfs` setting is set, several commands are outright blocked from running. Some of these commands, namely `repack` are actually OK to run in a Scalar clone, even if it uses the GVFS protocol (for Azure DevOps). Introduce a different blocking mechanism to only block commands when the virtual filesystem is being used, rather than as a broad block on any `core.gvfs` setting.
Currently when the `core.gvfs` setting is set, several commands are outright blocked from running. Some of these commands, namely `repack` are actually OK to run in a Scalar clone, even if it uses the GVFS protocol (for Azure DevOps). Introduce a different blocking mechanism to only block commands when the virtual filesystem is being used, rather than as a broad block on any `core.gvfs` setting.
Currently when the `core.gvfs` setting is set, several commands are outright blocked from running. Some of these commands, namely `repack` are actually OK to run in a Scalar clone, even if it uses the GVFS protocol (for Azure DevOps). Introduce a different blocking mechanism to only block commands when the virtual filesystem is being used, rather than as a broad block on any `core.gvfs` setting.
Currently when the `core.gvfs` setting is set, several commands are outright blocked from running. Some of these commands, namely `repack` are actually OK to run in a Scalar clone, even if it uses the GVFS protocol (for Azure DevOps). Introduce a different blocking mechanism to only block commands when the virtual filesystem is being used, rather than as a broad block on any `core.gvfs` setting.
Currently when the `core.gvfs` setting is set, several commands are outright blocked from running. Some of these commands, namely `repack` are actually OK to run in a Scalar clone, even if it uses the GVFS protocol (for Azure DevOps). Introduce a different blocking mechanism to only block commands when the virtual filesystem is being used, rather than as a broad block on any `core.gvfs` setting.
Currently when the
core.gvfs
setting is set, several commands are outright blocked from running. Some of these commands, namelyrepack
are actually OK to run in a Scalar clone, even if it uses the GVFS protocol (for Azure DevOps).Introduce a different blocking mechanism to only block commands when the virtual filesystem is being used, rather than as a broad block on any
core.gvfs
setting.