-
Notifications
You must be signed in to change notification settings - Fork 714
Rewrite release CI #11072
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Rewrite release CI #11072
Conversation
Perhaps it would be good to first establish what the goals with this PR is as it seems to be doing a few different things. I think as a first step, migrating to github CI should match the configurations and artifacts produced by the existing release CI. Afterwards, once that is achieved then further platforms and packaging tweaks can be explored in a more controlled manner. It seems risky to both migrate to a different CI platform, modify the supported platforms and also modify packaging options all in one commit. (Not that I am expressing a judgement about the value of also achieving those things as individual goals) |
The gitlab configuration contains EOLed distros and lacks many of the artifacts that the current PR provides. Matching gitlab would be mostly a regression. My impression is that the current release CI is not properly maintained, which is what this PR is trying to solve. |
Which EOL distros are you referring to exactly? It would be good to remove those certainly. Adding new platforms for release configurations seems to be an additive task which can be achieved in follow-up discussions if cabal maintainers decide they wish to produce binaries for new platforms. What additional artifacts are produced? How are those different from the existing release artifacts? Just so we are clear here Julian, I agree with the premise of this patch. Perhaps if you consider the other changes as necessary/desirable as well, another way forward would be to produce a detailed list of what is the same/different to before, so each one can be considered individually during a review. |
There are some updates I should do to this PR as well, but here is the inventory:
|
Our strategy going forward is as follows:
We end up with the following build environments:
|
This proposal seems to be a different approach to building artifacts than the cabal project has used in the past, could you open an issue to discuss this proposal with the other maintainers? It seems to be a different topic to this MR. It would be good to build less binaries overall, perhaps this approach could be made easier by adding some features of options to |
Indeed, your strategy seems very attractive and quite radical. Is it applicable to a different major project, such as GHC? or ghcup itself or something else major? I'd feel more comfortable if I could contant existing users of such a strategy and even more comfortable, if we could piggy-back on GHC, as we do currently, using CI very similar to what GHC uses and taking advantage of the professionalism of @chreekat and good will of the Haskell Foundation in the management of the release CI. Did GHC consider this strategy? |
@Mikolaj I find your response incredibly disrespectful as a contributor, because you're insinuating "we don't trust your expertise on this matter". I'm being paid by IOG to do this type of work and we will do it anyway, with or without upstream. But as we've been asked by @bgamari and others during ZuriHac whether we intend to contribute back upstream, here is the answer: yes we do. And here is our proof. In turn, this is a great opportunity for the cabal team to demonstrate that they're willing to collaborate with other teams outside of their core "cabal". Also as a side note: the Haskell Foundation itself apparently trusts my expertise on this matter as they are funding my proposal (and I am the person managing the Haskell Foundation GitHub runners). That doesn't mean our goals align and I'm willing to make adjustments to this PR, but I can't work with you, unless the commentary remains professional and technical. |
Edited: I don't feel I can productively contribute to reviewing this PR, so let me thank @mpickering for discussing its merits up to this point and let ma ask @mpickering and other cabal maintainers and developers to carry on. |
Right, I separately chatted with Julian and Mikolaj. It does seem like there are some unresolved tensions from past interactions. Thank you for your gracious reflection Mikolaj that you would find engaging with this PR difficult. With @hasufell I discussed the ways that we could move this PR forward, and that we should agree a shared plan of work. Julian is quite keen to make the CI line-up more with what's required for ghcup, which is understandable since ghcup is the primary way which cabal binaries are distributed. Cabal maintainers are keen to not have a system which relies on one person (Julian) to maintain, and wish to understand the new system carefully so they can use it without external help in future.
Does that match your impression from our conversation @hasufell ? |
@mpickering that sounds resonable. Tomorrow I will start:
Well, I think that the bus factor will be much better with the move to GitHub, since most cabal contributors already have exposure to GitHub CI and it doesn't require intimate knowledge of runners (except for FreeBSD, but I'm confident that @geekosaur also has the required expertise to handle it). |
From
I would too like to ask for an issue with goals clearly stated for sure, and possibly trade-offs and resources considerations. |
@ffaf1 I can do that, but please note that the motivation and merits have been discussed for years and I would like to assume by now Cabal developers are well aware. There even have been synchronous calls about this topic and multiple Cabal developers/contributors have expressed positive interest in this approach. So yes, I can open an issue with motivation and explanation, but I won't be participating in argumentative discussion, nor will I try to prove why GitHub is a better decision for release CI in the context of Cabal. This is up to you to decide. This is work that we're using over at stable-haskell and I'm happy to upstream it and invest additional effort into making it upstreamable. |
Thanks for making the ticket Julian, that's just a normal part of contributing to cabal. Having a place to discuss the issue which isn't on the PR is useful for the longevity of recording these discussions. I'll get back to looking at this next week when I'm back at work if you have made the changes we discussed. |
Only thing left in this PR is executing on the "only 2 bindists for linux" plan in a separate commit. |
branches: | ||
required: true | ||
type: string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't strictly necessary. We're passing a string that represents a json array here and it's parsed further down in the matrices:
strategy:
fail-fast: false
matrix:
branch: ${{ fromJSON(inputs.branches) }}
This allows downstream forks to implement rebase-queues trivially (triggering the release build for N branches at once and waiting for the result), staying in sync with upstream. I believe this doesn't complicate the workflow much, although upstream may not have a use case for it right now.
# starting with GHC 9.10.x, we also need to pass the 'install_extra' target | ||
ghc_targets: | ||
type: string | ||
default: "install_bin install_lib update_package_db" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This relies on the new ghcup --install-targets
feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much time does this save? It seems a little simpler and more robust to me to just install everything. Release CI isn't going to run very often.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Release CI is going to run every night. That's to catch issues early (EOLed distros, etc.).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For me, avoiding this extra logic of selecting makefile internal targets would be better because it will make upgrading GHC versions easier (for reasons like you write in the comment). That sounds like potentially quite a difficult thing to debug if you are not very familiar with packaging and have to update CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GHC upstream shouldn't frequently change the Makefile targets. It's an end user visible interface.
This is also in part codified here: https://gitlab.haskell.org/ghc/ghc-hq/-/blob/main/release-management.mkd?ref_type=heads#21-release-artifacts
How much time does this save?
It cuts installation time in half. On my machine that's from 111.51 seconds to 55.71.
tool-output: | ||
runs-on: ubuntu-latest | ||
outputs: | ||
apt_tools: ${{ steps.gen_output.outputs.apt_tools }} | ||
apt_tools_ncurses6: ${{ steps.gen_output.outputs.apt_tools_ncurses6 }} | ||
rpm_tools: ${{ steps.gen_output.outputs.rpm_tools }} | ||
apk_tools: ${{ steps.gen_output.outputs.apk_tools }} | ||
xbps_tools: ${{ steps.gen_output.outputs.xbps_tools }} | ||
env: | ||
APT: "groff-base libnuma-dev zlib1g-dev libgmp-dev libgmp10 libssl-dev liblzma-dev libbz2-dev git wget lsb-release software-properties-common gnupg2 apt-transport-https gcc autoconf automake build-essential curl ghc gzip libffi-dev libncurses-dev patchelf" | ||
RPM: "groff-base autoconf automake binutils bzip2 coreutils curl elfutils-devel elfutils-libs findutils gcc gcc-c++ git gmp gmp-devel jq lbzip2 make ncurses ncurses-compat-libs ncurses-devel openssh-clients patch perl pxz python3 sqlite sudo wget which xz zlib-devel patchelf" | ||
APK: "groff binutils-gold curl gcc g++ gmp-dev libc-dev libffi-dev make musl-dev ncurses-dev perl tar xz autoconf automake bzip2 coreutils elfutils-dev findutils git jq bzip2-dev patch python3 sqlite sudo wget which zlib-dev patchelf zlib zlib-dev zlib-static" | ||
XBPS: "groff ncurses-libtinfo-libs autoconf automake binutils bzip2 coreutils curl elfutils-devel elfutils findutils gcc gmp gmp-devel jq lbzip2 make ncurses ncurses-devel openssh patch perl python3 sqlite sudo wget which xz tar zlib-devel patchelf gzip" | ||
steps: | ||
- name: Generate output | ||
id: gen_output | ||
run: | | ||
echo apt_tools="${{ env.APT }} libncurses5 libtinfo5" >> "$GITHUB_OUTPUT" | ||
echo apt_tools_ncurses6="${{ env.APT }} libncurses6 libtinfo6" >> "$GITHUB_OUTPUT" | ||
echo rpm_tools="${{ env.RPM }}" >> "$GITHUB_OUTPUT" | ||
echo apk_tools="${{ env.APK }}" >> "$GITHUB_OUTPUT" | ||
echo xbps_tools="${{ env.XBPS }}" >> "$GITHUB_OUTPUT" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit of trickery to avoid duplication of those long package lists in the platform matrices, because environment variables aren't allowed in the matrix definition. But outputs are.
- name: Install GHCup | ||
uses: haskell/ghcup-setup@v1 | ||
with: | ||
cabal: ${{ env.CABAL_VERSION }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use the https://github.com/marketplace/actions/ghcup-setup action, because it is simpler and more predictable than the haskell setup action and also allows to set the complete ghcup config declaratively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
type: string | ||
ghc: | ||
type: string | ||
default: 9.6.7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually had test failures with 9.10.2
, but I can try again some time later.
My opinion is that 9.6.7 is a more robust release for producing binaries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We decided to build the release binaries with 9.10.2
(in #11032), can you please update this variable to use that release?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but I won't be investigating test failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As expected, we have test failures:
- FreeBSD
- cabal-testsuite/PackageTests/NewBuild/Semaphore/cabal.test.hs
- various Linuxes:
- cabal-testsuite/PackageTests/MultiRepl/EnabledBadClosure/cabal.test.hs
I do not have bandwidth to investigate these, since using 9.10.2 for releases has no value for us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for noting the cases which are failing. It's important that this PR aligns with the other decisions taken by cabal developers so I'll ask in matrix if anyone has the time to investigate these failures.
I suspect these failures are not to do with building with ghc-9.10.2
but that the same variable specifies the version of ghc which cabal-install
is tested with. (So building with 9.6.7 or 9.10.2, you would get the same test failures if you tested against 9.10.2).
# TODO: we want to avoid building here... we should just | ||
# be using the previously built 'cabal-tests' binary | ||
cabal run ${ADD_CABAL_ARGS} cabal-testsuite:cabal-tests -- \ | ||
--with-cabal "$(pwd)/out/cabal" \ | ||
--intree-cabal-lib "$(pwd)" \ | ||
--test-tmp "$(pwd)/testdb" \ | ||
--skip-setup-tests \ | ||
-j "$(nproc)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I spend some time trying to figure out if we can trivially make this more portable. But it appears non-trivial. Also see #11048
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking at that, it's not a very easy problem to solve with the current architecture.
4914cd0
to
20415dd
Compare
06fedea
to
64ffd68
Compare
@hasufell Can you please move the commit which uses the new packaging story you propose to separate PR, then I can review the release CI script which is outlined in this comment |
I have responded to the comments you left Julian, that was a helpful place to start. |
60e619e
to
9c275e2
Compare
This is a rewrite and an extension of #9437
The main motivation is to:
The major difference to the previous PR is that we run the "integration tests" of cabal-install (because that is the main part that diverges from the validate pipeline in terms of configuration and build flags).