Skip to content

Fix very long t5799-gvfs-helper.sh runtimes in our CI runs #760

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 5, 2025

Conversation

dscho
Copy link
Member

@dscho dscho commented May 27, 2025

I have noticed for over a year now that some jobs in microsoft/git's CI take quite a long time: well over an hour.

Finally I have a fix for that. All those build minutes were quite wastefully spent on... sleeping.

The whole story is written down in the commit message, if anyone is interested.

@dscho dscho requested review from derrickstolee and mjcheetham May 27, 2025 11:05
@dscho dscho self-assigned this May 27, 2025
@dscho dscho marked this pull request as draft May 27, 2025 11:32
derrickstolee
derrickstolee previously approved these changes May 27, 2025
Copy link

@derrickstolee derrickstolee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for digging into this! I've also struggled with these slow CI times.

@dscho dscho force-pushed the fix-long-running-t5799-on-older-linuxes branch from 9192cb1 to 457a019 Compare June 3, 2025 20:50
@dscho dscho requested a review from derrickstolee June 3, 2025 22:42
@dscho dscho marked this pull request as ready for review June 5, 2025 13:51
@dscho dscho marked this pull request as draft June 5, 2025 13:52
Some jobs of Git's CI/PR workflow run in Ubuntu 20.04. It would appear
that that Ubuntu's version of libcurl, 7.68.0-1ubuntu2.25, behaves in an
unexpected way: when running with `CURLOPT_FAILONERROR` enabled (which
Git sets in `get_active_slot()`), the HTTP headers are not parsed in the
failure code path.

This has ramifications: `gvfs-helper` _wants_ to parse the `Retry-After`
header, among other things, when encountering, say, a 503 (which is
precisely what t5799.29 (integration: implicit-get: http_503: diff 2
commits) tests for, and likewise t5799.30 (integration: implicit-get:
cache_http_503,no-fallback: diff 2 commits).

If this `Retry-After` header is parsed as expected, in both of these
test cases the `gvfs-helper` (which is called implicitly by `git diff`
to retrieve the prerequisite Git objects) will back off six times, each
time waiting 2 seconds (and printing "Waiting on hard throttle (sec):
100% (2/2), done." if run interactively), then try the thing once more,
and then fail as expected.

If this `Retry-After` header is _not_ parsed, `gvfs-helper` will take a
_different_ failure code path. The symptom is the error message "Waiting
to retry after network error (sec):  100% (8/8)", waiting 8 seconds,
followed by several similar error messages that double the waiting time
until 256 seconds, then dropping back to 8 seconds, doubling several
times until 128 seconds, and then giving up. The end result looks like
the expected failure, but it didn't take a mere 2x24 seconds in total to
fail for good, but 2x~12½ minutes!

As a consequence, the t5799 script alone regularly wasted over an hour
in microsoft/git's CI jobs running on Ubuntu 20.04.

The reason for this is curl/curl@ab525c059eb
(http: have CURLOPT_FAILONERROR fail after all headers, 2021-01-04)
which is curl-7_75_0~136, i.e. not part of libcurl 7.68.0-1ubuntu2.25.
This change allows libcurl to parse the HTTP headers even in the failure
code path, even when `CURLOPT_FAILONERROR` is enabled.

It would be tempting to remedy this problem here in `gvfs-helper` by
disabling `CURLOPT_FAILONERROR`; The code in `gvfs-helper` seems already
well-prepared to handle all of the errors based on the HTTP result code,
even without libcurl returning `CURLE_HTTP_RETURNED_ERROR`. However, in
my tests, doing this unilaterally results in occasional (read: flaky)
501s in t5799.25(HTTP POST Auth on Origin Server). So let's do this only
when the cURL version is really old (thanks, Debian!).

This results in a slight change of behavior: When authentication fails,
the connection is now kept open. Testament to this behavioral change is
t5799.24(HTTP GET Auth on Origin Server) which wanted to verify that
there are two connections when authentication is required, but now there
is only one (the rest of the test case verifies that fetching the loose
object worked, i.e. that the authentication succeeded). Let's just skip
that verification in case of an older cURL version.

I briefly considered an alternative, where the affected test cases of
t5799 would be skipped depending on this prerequisite:

  test_lazy_prereq CURL_7_75_OR_NEWER '
    test 7.75.0 = "$(printf '%s\n' 7.75.0 $(curl --version |
        sed -n '1s/^curl \([^ ]*\).*/\1/p') |
      sort -n -t . |
      head -n 1)"
  '

But that is not only one ugly prereq definition, it would also lose test
coverage for Ubuntu 20.04 (and guarantee a bad experience of
microsoft/git's users on that operating system in case of 503s). Since
https://github.com/microsoft/git/blob/v2.49.0.vfs.0.3/.github/workflows/build-git-installers.yml#L559
explicitly targets Ubuntu 20.04, I rejected that alternative.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
@dscho dscho force-pushed the fix-long-running-t5799-on-older-linuxes branch from 457a019 to f27f40f Compare June 5, 2025 14:05
@dscho
Copy link
Member Author

dscho commented Jun 5, 2025

I re-thought things and decided to make the cURL version check a runtime one instead of a compile time one.

@dscho dscho marked this pull request as ready for review June 5, 2025 14:17
@dscho dscho merged commit 6523961 into vfs-2.49.0 Jun 5, 2025
116 of 119 checks passed
@dscho dscho deleted the fix-long-running-t5799-on-older-linuxes branch June 5, 2025 15:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants