Skip to content

gvfs-helper-client: clean up server process #756

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 13, 2025

Conversation

derrickstolee
Copy link

@derrickstolee derrickstolee commented May 23, 2025

On Linux, the following command would cause the terminal to be stuck waiting:

  git fetch origin foobar

The issue would be that the fetch would fail with the error

  fatal: couldn't find remote ref foobar

but the underlying git-gvfs-helper process wouldn't die. The subprocess_exit_handler() method would close its stdin and stdout, but that wouldn't be enough to cause the process to end.

The current approach creates our own atexit() handler by sending a SIGINT signal to the child process. This has worked in my end-to-end test.

Perhaps another approach would be to change the way git-gvfs-helper server gets messages over stdin to better terminate when the pipe closes. However, I debugged to the point that I could see the git-gvfs-helper process waiting on a basic read() from the standard library, so it's not mishandling an EOF signal or anything.

@derrickstolee derrickstolee self-assigned this May 23, 2025
@derrickstolee derrickstolee marked this pull request as ready for review May 23, 2025 19:00
@derrickstolee
Copy link
Author

Promoting to a full PR after trying to explore the option of being better bout stdin closing. In fact, the git-gvfs-helper process is waiting on stdin in the read() method and isn't getting a response. So there's nothing to be done in that direction.

@derrickstolee derrickstolee requested review from dscho and mjcheetham May 23, 2025 19:01
@@ -48,6 +58,8 @@ static int gh_client__start_fn(struct subprocess_entry *subprocess)

struct gh_server__process *entry = (struct gh_server__process *)subprocess;

server_process_pid = subprocess->process.pid;
atexit(cleanup_server_process_on_exit);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it may be interesting to find out why the process does not finish when stdin closes (looks like the packet_read_line_gently() call that should return -1 does not? Or maybe the problem is that child processes continue to exist as zombie processes on Linux until a waitpid() of their parent process releases them?), an atexit() handler definitely takes care of cleaning up the server process.

However, there may be more than one gvfs-helper process to clean up. @derrickstolee could you test whether b8d292f (pushed to microsoft/git as gvfs-helper-linux-term) addresses the problem you observed, too?

@derrickstolee derrickstolee force-pushed the gvfs-helper-linux-term branch from 5d57c6d to 68f5d49 Compare May 27, 2025 13:17
@@ -48,6 +50,8 @@ static int gh_client__start_fn(struct subprocess_entry *subprocess)

struct gh_server__process *entry = (struct gh_server__process *)subprocess;

atexit(stop_gh_server_subprocesses);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should move this all the way into the cmd_main() function, to be 100% sure that it is registered, and registered one time only?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could work, or I could add a static int registered = 0; into this method. This is definitely on the critical path for starting a process.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I do not understand is: the previous location was in the code path that initializes the hashmap, and the new location is in the gh_client__start_fn() function which is passed as a callback a couple lines further down from the previous location. So how is it possible that it was not called early enough in the previous location but it is early enough in the new location?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also confused by this, since I specifically traced that the hashing location was being called but somehow wasn't sticking. I'll look into it some more later this week.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running the gvfs-helper test script with SANITIZE=leak is a good reproducer for the issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll play with this, but wonder why the current linux-leaks job seems not to be bothered (and succeeds without this here PR).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current iteration puts the atexit() in cmd_main(). If you put it in the "right" place, then the leak check will fail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not reproduce the hang with SANITIZE=leak, but with this diff:

diff --git a/t/t9210-scalar.sh b/t/t9210-scalar.sh
index 9bfd3390d4be..50c8aaedacee 100755
--- a/t/t9210-scalar.sh
+++ b/t/t9210-scalar.sh
@@ -426,6 +426,10 @@ test_expect_success '`scalar clone` with GVFS-enabled server' '
 	)
 '
 
+test_expect_success 'fetch <non-existent> does not hang in gvfs-helper' '
+	test_must_fail git -C using-gvfs/src fetch origin does-not-exist
+'
+
 test_expect_success '`scalar clone --no-gvfs-protocol` skips gvfs/config' '
 	# the fake cache server requires fake authentication &&
 	git config --global core.askPass true &&

I plan on making this new test case a part of this PR.

Copy link
Member

@dscho dscho Jun 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@derrickstolee I finally figured out what is going wrong.

The subprocess_start() function registers a clean_on_exit_handler for all subprocesses. This handler is called by the cleanup_children() function which is run via run-command.c's own atexit() handler.

This latter atexit() handler is registered way before the gvfs-helper.c code is run, and therefore does the wait-and-hang thing before the atexit() handler I wanted to introduce had a chance.

My current approach is completely different: It overrides the clean_on_exit_handler for the gvfs-helper sub-processes. Therefore we do not need an extra atexit() handler at all, and hence do not run into atexit() handler ordering issues, either.

@derrickstolee derrickstolee force-pushed the gvfs-helper-linux-term branch from 0a798ea to 344fd32 Compare May 30, 2025 16:12
@dscho dscho force-pushed the gvfs-helper-linux-term branch 2 times, most recently from 5bafc74 to 68dc64e Compare June 9, 2025 15:16
@dscho
Copy link
Member

dscho commented Jun 9, 2025

Whoopsie. I needed to take care of the new leaks...

@dscho
Copy link
Member

dscho commented Jun 9, 2025

Sadly I cannot reproduce the linux-asan-ubsan failures locally. This will require another epic debugging session, but not before Wednesday.

On Linux, the following command would cause the terminal to be stuck
waiting:

  git fetch origin foobar

The issue would be that the fetch would fail with the error

  fatal: couldn't find remote ref foobar

but the underlying git-gvfs-helper process wouldn't die. The
`subprocess_exit_handler()` method would close its stdin and stdout, but
that wouldn't be enough to cause the process to end, even though the
`packet_read_line_gently()` call that is run in `do_sub_cmd__server()`
in a loop should return -1 and the process should the terminate
peacefully.

While it is unclear why this does not happen, there may be other
conditions where the `gvfs-helper` process would not terminate. Let's
ensure that the gvfs-helper-client process cleans up the gvfs-helper
server processes that it spawned upon exit.

Reported-by: Stuart Wilcox Humilde <stuartwilcox@microsoft.com>
Co-authored-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
@dscho dscho force-pushed the gvfs-helper-linux-term branch from 68dc64e to 4c15724 Compare June 12, 2025 15:29
@dscho
Copy link
Member

dscho commented Jun 12, 2025

Sadly I cannot reproduce the linux-asan-ubsan failures locally.

It's really strange that I wasn't able to do this locally; action-tmate helped me realize that the loop in gh_client__subprocess_exit_handler() would have free()d structures that are still needed by cleanup_children() to iterate over the child processes, which was the reason for those failures.

So I went back to the drawing board and realized that the only problem I need to address is that subprocess_exit_handler() calls finish_command() before returning to cleanup_children() has a chance to send the signal to the child process.

So all I had to do was to make a near-identical copy of subprocess_exit_handler() that does all the same things except wait for the child process to finish, then use that in the gvfs-helper-client code.

Copy link
Author

@derrickstolee derrickstolee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd approve, but this is my PR!

@dscho dscho merged commit 6e041f8 into microsoft:vfs-2.49.0 Jun 13, 2025
62 checks passed
dscho added a commit that referenced this pull request Jun 13, 2025
On Linux, the following command would cause the terminal to be stuck
waiting:

```
  git fetch origin foobar
```

The issue would be that the fetch would fail with the error

```
  fatal: couldn't find remote ref foobar
```

but the underlying `git-gvfs-helper` process wouldn't die. The
`subprocess_exit_handler()` method would close its stdin and stdout, but
that wouldn't be enough to cause the process to end.

This PR addresses that by skipping the `finish_command()` call of the
`clean_on_exit_handler` and instead lets `cleanup_children()` send a
SIGTERM to terminate those spawned child processes.
dscho added a commit that referenced this pull request Jun 16, 2025
On Linux, the following command would cause the terminal to be stuck
waiting:

```
  git fetch origin foobar
```

The issue would be that the fetch would fail with the error

```
  fatal: couldn't find remote ref foobar
```

but the underlying `git-gvfs-helper` process wouldn't die. The
`subprocess_exit_handler()` method would close its stdin and stdout, but
that wouldn't be enough to cause the process to end.

This PR addresses that by skipping the `finish_command()` call of the
`clean_on_exit_handler` and instead lets `cleanup_children()` send a
SIGTERM to terminate those spawned child processes.
dscho added a commit that referenced this pull request Jun 16, 2025
On Linux, the following command would cause the terminal to be stuck
waiting:

```
  git fetch origin foobar
```

The issue would be that the fetch would fail with the error

```
  fatal: couldn't find remote ref foobar
```

but the underlying `git-gvfs-helper` process wouldn't die. The
`subprocess_exit_handler()` method would close its stdin and stdout, but
that wouldn't be enough to cause the process to end.

This PR addresses that by skipping the `finish_command()` call of the
`clean_on_exit_handler` and instead lets `cleanup_children()` send a
SIGTERM to terminate those spawned child processes.
dscho added a commit that referenced this pull request Jun 16, 2025
On Linux, the following command would cause the terminal to be stuck
waiting:

```
  git fetch origin foobar
```

The issue would be that the fetch would fail with the error

```
  fatal: couldn't find remote ref foobar
```

but the underlying `git-gvfs-helper` process wouldn't die. The
`subprocess_exit_handler()` method would close its stdin and stdout, but
that wouldn't be enough to cause the process to end.

This PR addresses that by skipping the `finish_command()` call of the
`clean_on_exit_handler` and instead lets `cleanup_children()` send a
SIGTERM to terminate those spawned child processes.
dscho added a commit that referenced this pull request Jun 16, 2025
On Linux, the following command would cause the terminal to be stuck
waiting:

```
  git fetch origin foobar
```

The issue would be that the fetch would fail with the error

```
  fatal: couldn't find remote ref foobar
```

but the underlying `git-gvfs-helper` process wouldn't die. The
`subprocess_exit_handler()` method would close its stdin and stdout, but
that wouldn't be enough to cause the process to end.

This PR addresses that by skipping the `finish_command()` call of the
`clean_on_exit_handler` and instead lets `cleanup_children()` send a
SIGTERM to terminate those spawned child processes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants