Skip to content

Rewrite release CI #11072

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from
Draft

Rewrite release CI #11072

wants to merge 6 commits into from

Conversation

hasufell
Copy link
Member

This is a rewrite and an extension of #9437

The main motivation is to:

  1. have a reliable release CI that runs (mostly) on unmanaged runners
  2. have access to GitHubs CI infrastructure for faster and more reliable execution
  3. unify the CI so that we only need to maintain one place/flavor of CI (ease of maintenance, less knowledge required for contributors etc.)

The major difference to the previous PR is that we run the "integration tests" of cabal-install (because that is the main part that diverges from the validate pipeline in terms of configuration and build flags).

@mpickering
Copy link
Collaborator

Perhaps it would be good to first establish what the goals with this PR is as it seems to be doing a few different things.

I think as a first step, migrating to github CI should match the configurations and artifacts produced by the existing release CI. Afterwards, once that is achieved then further platforms and packaging tweaks can be explored in a more controlled manner.

It seems risky to both migrate to a different CI platform, modify the supported platforms and also modify packaging options all in one commit. (Not that I am expressing a judgement about the value of also achieving those things as individual goals)

@hasufell
Copy link
Member Author

I think as a first step, migrating to github CI should match the configurations and artifacts produced by the existing release CI

The gitlab configuration contains EOLed distros and lacks many of the artifacts that the current PR provides. Matching gitlab would be mostly a regression.

My impression is that the current release CI is not properly maintained, which is what this PR is trying to solve.

@mpickering
Copy link
Collaborator

Which EOL distros are you referring to exactly? It would be good to remove those certainly.

Adding new platforms for release configurations seems to be an additive task which can be achieved in follow-up discussions if cabal maintainers decide they wish to produce binaries for new platforms.

What additional artifacts are produced? How are those different from the existing release artifacts?

Just so we are clear here Julian, I agree with the premise of this patch. Perhaps if you consider the other changes as necessary/desirable as well, another way forward would be to produce a detailed list of what is the same/different to before, so each one can be considered individually during a review.

@hasufell
Copy link
Member Author

hasufell commented Jul 15, 2025

Which EOL distros are you referring to exactly?

There are some updates I should do to this PR as well, but here is the inventory:

Tarball GHCup Cabal Remark
x86_64-linux-deb10 - + EOL
x86_64-linux-deb11 + +
x86_64-linux-deb12 + +
x86_64-linux-fedora33 + + EOL
x86_64-linux-fedora36 - + EOL
x86_64-linux-fedora37 + - EOL
x86_64-linux-fedora38 - + EOL
x86_64-linux-rocky8 + +
x86_64-linux-rocky9 + -
x86_64-linux-ubuntu20_04 + +
x86_64-linux-ubuntu22_04 + +
x86_64-linux-ubuntu24_04 + +
x86_64-linux-mint20_3 + - EOL
x86_64-linux-mint21_3 + -
x86_64-linux-mint22_1 - -
x86_64-linux-alpine3_12 + + EOL, static
x86_64-linux-alpine3_19 + - static
x86_64-linux-alpine3_20 - + static
x86_64-linux-void-glibc + - rolling
x86_64-freebsd14_2 + -
x86_64-darwin + +
x86_64-windows + +
aarch64-linux-deb10 - + EOL
aarch64-linux-deb11 + +
aarch64-linux-deb12 - +
aarch64-linux-alpine3_18 - + EOL, static
aarch64-linux-alpine3_21 + - static
aarch64-darwin + +
i386-linux-deb10 - + EOL
i386-linux-alpine3_12 - + EOL, static
i386-linux-alpine3_20 - + static
i386-linux-alpine3_21 + - static

@hasufell
Copy link
Member Author

Our strategy going forward is as follows:

  • we want to reduce the linux builds to 2:
    • one fully statically linked against musl
    • one dynamically linked against the oldest glibc that's not EOL, but statically against other libraries (gmp, ncurses etc)... this works via pointing --extra-lib-dirs to a dir which includes only static library archives (cabal only seems to link against gmp and I have a working example binary)
  • then all known glibc based distros use the glibc one and alpine/unkown distros can use the static musl one

We end up with the following build environments:

Build env Remark
x86_64-linux-rocky8 glibc
x86_64-linux-alpine3_22 musl
x86_64-freebsd14_2
x86_64-darwin
x86_64-windows
aarch64-linux-rocky8 glibc
aarch64-linux-alpine3_22 musl
aarch64-darwin
i386-linux-deb11 glibc
i386-linux-alpine3_22 musl

@mpickering
Copy link
Collaborator

Our strategy going forward is as follows:

* we want to reduce the linux builds to 2:
  
  * one fully statically linked against musl
  * one dynamically linked against the oldest glibc that's not EOL, but statically against other libraries (gmp, ncurses etc)... this works via pointing `--extra-lib-dirs` to a dir which includes only static library archives (cabal only seems to link against gmp and I have a working example binary)

* then all known glibc based distros use the glibc one and alpine/unkown distros can use the static musl one

This proposal seems to be a different approach to building artifacts than the cabal project has used in the past, could you open an issue to discuss this proposal with the other maintainers? It seems to be a different topic to this MR.

It would be good to build less binaries overall, perhaps this approach could be made easier by adding some features of options to cabal-install if it works out well.

@Mikolaj
Copy link
Member

Mikolaj commented Jul 16, 2025

Our strategy going forward is as follows:

Indeed, your strategy seems very attractive and quite radical. Is it applicable to a different major project, such as GHC? or ghcup itself or something else major? I'd feel more comfortable if I could contant existing users of such a strategy and even more comfortable, if we could piggy-back on GHC, as we do currently, using CI very similar to what GHC uses and taking advantage of the professionalism of @chreekat and good will of the Haskell Foundation in the management of the release CI. Did GHC consider this strategy?

@hasufell
Copy link
Member Author

if we could piggy-back on GHC, as we do currently, using CI very similar to what GHC uses and taking advantage of the professionalism of @chreekat and good will of the Haskell Foundation in the management of the release CI

@Mikolaj I find your response incredibly disrespectful as a contributor, because you're insinuating "we don't trust your expertise on this matter".

I'm being paid by IOG to do this type of work and we will do it anyway, with or without upstream. But as we've been asked by @bgamari and others during ZuriHac whether we intend to contribute back upstream, here is the answer: yes we do. And here is our proof. In turn, this is a great opportunity for the cabal team to demonstrate that they're willing to collaborate with other teams outside of their core "cabal".

Also as a side note: the Haskell Foundation itself apparently trusts my expertise on this matter as they are funding my proposal (and I am the person managing the Haskell Foundation GitHub runners).

That doesn't mean our goals align and I'm willing to make adjustments to this PR, but I can't work with you, unless the commentary remains professional and technical.

@Mikolaj
Copy link
Member

Mikolaj commented Jul 17, 2025

Edited: I don't feel I can productively contribute to reviewing this PR, so let me thank @mpickering for discussing its merits up to this point and let ma ask @mpickering and other cabal maintainers and developers to carry on.

@mpickering
Copy link
Collaborator

Right, I separately chatted with Julian and Mikolaj. It does seem like there are some unresolved tensions from past interactions. Thank you for your gracious reflection Mikolaj that you would find engaging with this PR difficult.

With @hasufell I discussed the ways that we could move this PR forward, and that we should agree a shared plan of work.

Julian is quite keen to make the CI line-up more with what's required for ghcup, which is understandable since ghcup is the primary way which cabal binaries are distributed.

Cabal maintainers are keen to not have a system which relies on one person (Julian) to maintain, and wish to understand the new system carefully so they can use it without external help in future.

  • Julian said to me what as a first step he could make the platforms in this PR align more, that seems like an uncontroversial way to make progress. But if maintaining platform compatibility is difficult for other reasons, we are not opposed to also updating distributed platforms at the same time.
  • Matt can then take a careful look at the patch, with the mindful goal of something which can be maintained without Julian's assistance if necessary.
  • The new approach to making binaries sounds interesting, but should be discussed on a ticket and via an experimental job before committing to go down that track for the actual releases. Julian will make a ticket where this can be discussed more broadly.

Does that match your impression from our conversation @hasufell ?

@hasufell
Copy link
Member Author

@mpickering that sounds resonable.

Tomorrow I will start:

  • splitting the PR into smaller chunks (e.g. the platform updates in a separate commit)
  • moving the release stuff to a reusable workflow, so it can be triggered from other places
  • open a ticket about reducing linux builds and how to achieve it

with the mindful goal of something which can be maintained without Julian's assistance if necessary.

Well, I think that the bus factor will be much better with the move to GitHub, since most cabal contributors already have exposure to GitHub CI and it doesn't require intimate knowledge of runners (except for FreeBSD, but I'm confident that @geekosaur also has the required expertise to handle it).

@ffaf1
Copy link
Collaborator

ffaf1 commented Jul 17, 2025

From CONTRIBUTING.md.

A pull request fixes a problem that is described in an issue. Make sure to file an issue before opening a pull request. In the issue you can illustrate your proposed design, UX considerations, tradeoffs etc. and work them out with other contributors. The PR itself is for implementation.

I would too like to ask for an issue with goals clearly stated for sure, and possibly trade-offs and resources considerations.

@hasufell
Copy link
Member Author

hasufell commented Jul 18, 2025

@ffaf1 I can do that, but please note that the motivation and merits have been discussed for years and I would like to assume by now Cabal developers are well aware. There even have been synchronous calls about this topic and multiple Cabal developers/contributors have expressed positive interest in this approach.

So yes, I can open an issue with motivation and explanation, but I won't be participating in argumentative discussion, nor will I try to prove why GitHub is a better decision for release CI in the context of Cabal. This is up to you to decide.

This is work that we're using over at stable-haskell and I'm happy to upstream it and invest additional effort into making it upstreamable.

@hasufell hasufell mentioned this pull request Jul 18, 2025
@hasufell hasufell linked an issue Jul 18, 2025 that may be closed by this pull request
@hasufell
Copy link
Member Author

#11078

@hasufell hasufell marked this pull request as draft July 18, 2025 04:06
@mpickering
Copy link
Collaborator

Thanks for making the ticket Julian, that's just a normal part of contributing to cabal. Having a place to discuss the issue which isn't on the PR is useful for the longevity of recording these discussions.

I'll get back to looking at this next week when I'm back at work if you have made the changes we discussed.

@hasufell
Copy link
Member Author

Only thing left in this PR is executing on the "only 2 bindists for linux" plan in a separate commit.

Comment on lines +6 to +8
branches:
required: true
type: string
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't strictly necessary. We're passing a string that represents a json array here and it's parsed further down in the matrices:

    strategy:
      fail-fast: false
      matrix:
        branch: ${{ fromJSON(inputs.branches) }}

This allows downstream forks to implement rebase-queues trivially (triggering the release build for N branches at once and waiting for the result), staying in sync with upstream. I believe this doesn't complicate the workflow much, although upstream may not have a use case for it right now.

# starting with GHC 9.10.x, we also need to pass the 'install_extra' target
ghc_targets:
type: string
default: "install_bin install_lib update_package_db"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This relies on the new ghcup --install-targets feature.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much time does this save? It seems a little simpler and more robust to me to just install everything. Release CI isn't going to run very often.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Release CI is going to run every night. That's to catch issues early (EOLed distros, etc.).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me, avoiding this extra logic of selecting makefile internal targets would be better because it will make upgrading GHC versions easier (for reasons like you write in the comment). That sounds like potentially quite a difficult thing to debug if you are not very familiar with packaging and have to update CI.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GHC upstream shouldn't frequently change the Makefile targets. It's an end user visible interface.

This is also in part codified here: https://gitlab.haskell.org/ghc/ghc-hq/-/blob/main/release-management.mkd?ref_type=heads#21-release-artifacts

How much time does this save?

It cuts installation time in half. On my machine that's from 111.51 seconds to 55.71.

Comment on lines +29 to +53
tool-output:
runs-on: ubuntu-latest
outputs:
apt_tools: ${{ steps.gen_output.outputs.apt_tools }}
apt_tools_ncurses6: ${{ steps.gen_output.outputs.apt_tools_ncurses6 }}
rpm_tools: ${{ steps.gen_output.outputs.rpm_tools }}
apk_tools: ${{ steps.gen_output.outputs.apk_tools }}
xbps_tools: ${{ steps.gen_output.outputs.xbps_tools }}
env:
APT: "groff-base libnuma-dev zlib1g-dev libgmp-dev libgmp10 libssl-dev liblzma-dev libbz2-dev git wget lsb-release software-properties-common gnupg2 apt-transport-https gcc autoconf automake build-essential curl ghc gzip libffi-dev libncurses-dev patchelf"
RPM: "groff-base autoconf automake binutils bzip2 coreutils curl elfutils-devel elfutils-libs findutils gcc gcc-c++ git gmp gmp-devel jq lbzip2 make ncurses ncurses-compat-libs ncurses-devel openssh-clients patch perl pxz python3 sqlite sudo wget which xz zlib-devel patchelf"
APK: "groff binutils-gold curl gcc g++ gmp-dev libc-dev libffi-dev make musl-dev ncurses-dev perl tar xz autoconf automake bzip2 coreutils elfutils-dev findutils git jq bzip2-dev patch python3 sqlite sudo wget which zlib-dev patchelf zlib zlib-dev zlib-static"
XBPS: "groff ncurses-libtinfo-libs autoconf automake binutils bzip2 coreutils curl elfutils-devel elfutils findutils gcc gmp gmp-devel jq lbzip2 make ncurses ncurses-devel openssh patch perl python3 sqlite sudo wget which xz tar zlib-devel patchelf gzip"
steps:
- name: Generate output
id: gen_output
run: |
echo apt_tools="${{ env.APT }} libncurses5 libtinfo5" >> "$GITHUB_OUTPUT"
echo apt_tools_ncurses6="${{ env.APT }} libncurses6 libtinfo6" >> "$GITHUB_OUTPUT"
echo rpm_tools="${{ env.RPM }}" >> "$GITHUB_OUTPUT"
echo apk_tools="${{ env.APK }}" >> "$GITHUB_OUTPUT"
echo xbps_tools="${{ env.XBPS }}" >> "$GITHUB_OUTPUT"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit of trickery to avoid duplication of those long package lists in the platform matrices, because environment variables aren't allowed in the matrix definition. But outputs are.

Comment on lines +163 to +162
- name: Install GHCup
uses: haskell/ghcup-setup@v1
with:
cabal: ${{ env.CABAL_VERSION }}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use the https://github.com/marketplace/actions/ghcup-setup action, because it is simpler and more predictable than the haskell setup action and also allows to set the complete ghcup config declaratively.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

type: string
ghc:
type: string
default: 9.6.7
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually had test failures with 9.10.2, but I can try again some time later.

My opinion is that 9.6.7 is a more robust release for producing binaries.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We decided to build the release binaries with 9.10.2 (in #11032), can you please update this variable to use that release?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but I won't be investigating test failures.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As expected, we have test failures:

  • FreeBSD
    • cabal-testsuite/PackageTests/NewBuild/Semaphore/cabal.test.hs
  • various Linuxes:
    • cabal-testsuite/PackageTests/MultiRepl/EnabledBadClosure/cabal.test.hs

I do not have bandwidth to investigate these, since using 9.10.2 for releases has no value for us.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for noting the cases which are failing. It's important that this PR aligns with the other decisions taken by cabal developers so I'll ask in matrix if anyone has the time to investigate these failures.

I suspect these failures are not to do with building with ghc-9.10.2 but that the same variable specifies the version of ghc which cabal-install is tested with. (So building with 9.6.7 or 9.10.2, you would get the same test failures if you tested against 9.10.2).

Comment on lines +27 to +34
# TODO: we want to avoid building here... we should just
# be using the previously built 'cabal-tests' binary
cabal run ${ADD_CABAL_ARGS} cabal-testsuite:cabal-tests -- \
--with-cabal "$(pwd)/out/cabal" \
--intree-cabal-lib "$(pwd)" \
--test-tmp "$(pwd)/testdb" \
--skip-setup-tests \
-j "$(nproc)"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spend some time trying to figure out if we can trivially make this more portable. But it appears non-trivial. Also see #11048

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking at that, it's not a very easy problem to solve with the current architecture.

@hasufell hasufell force-pushed the release-ci branch 3 times, most recently from 4914cd0 to 20415dd Compare July 21, 2025 04:29
@hasufell hasufell force-pushed the release-ci branch 5 times, most recently from 06fedea to 64ffd68 Compare July 21, 2025 06:30
@mpickering
Copy link
Collaborator

mpickering commented Jul 21, 2025

@hasufell Can you please move the commit which uses the new packaging story you propose to separate PR, then I can review the release CI script which is outlined in this comment

@mpickering
Copy link
Collaborator

I have responded to the comments you left Julian, that was a helpful place to start.

@hasufell hasufell force-pushed the release-ci branch 2 times, most recently from 60e619e to 9c275e2 Compare July 21, 2025 08:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rewrite release CI
6 participants