Skip to content

Proposal: Migrate Agones to the Linux Foundation #4304

@markmandel

Description

@markmandel

Note: When I write “Linux Foundation” or “LF”, I mean either the Linux Foundation itself, or any of its sub-foundations such as CNCF, O3DE, etc. It’s just easier to type “Linux Foundation”. Also see the open question below on “Which part of the Linux Foundation is the right fit for this project?”.

Background

As many of you I’m sure are aware, - back in 2017, I started the Agones project, and was lead maintainer and helped grow into what, I think I can fairly say, is a staple of the multiplayer games industry for folks who want to orchestrate their own game servers as part of their multiplayer backend systems while at Google.

Back in January 2025, I left Google, but have continued to work on Agones in my spare time for enjoyment, and wanting to see the project continue to succeed.

Unfortunately, since I left, it has not been smooth sailing for this project, for example:

  • Only myself and other community members are doing feature work and feature review on Agones. (Shout outs in particular to @miai10 , @lacroixthomas and @keith-miller ) – PR’s merged since 2025-01-15,
  • Critical bugs such as the Unreal SDK websocket issue, weren’t fixed by Google, but by the community (particular call out to @swermin for tackling this!), and even minor bugs like “the PR Update script”, have been broken since March due to expired credentials that haven’t been updated (issue).
  • Work on in-place upgrades has seemingly stopped, even though CI for in-place upgrades was so flaky I had to remove it (PR) from working so that other work could be completed.
  • I’ve been the only person who is looking to expand and diversify our approver pool, because getting review and approval from Google staff requires constant follow ups, and can take days if not over a week on each and every review.

That being said, Google has continued to tackle and maintain:

  • The Agones project and test infrastructure (which is good, because nobody else has access ☺️)
  • Running releases, and maintaining the build and release infrastructure
  • Updating Kubernetes versions.
  • Updating Go versions

The following are assumptions that I’ve made about Google Cloud's involvement in the project since my departure solely based on observations made once I have left Google. If anyone from Google wants to confirm or correct any of these, I would be delighted. Also if anyone else from the community feels similarly or wants to expand further, please do so in the comments below.

  • Google Cloud wants to maintain a degree of involvement in the project as they have multiple public references to high profile customers using Agones as well as a commitment to support for Platinum Support subscribers and therefore it would be to their detriment to drop the project completely.
  • There are no Full Time Employees (FTE) that work on Agones full time anymore at Google, either from a software engineering or community perspective.
  • The only person who may be full time is a third party contractor (xWF), but has little oversight or support.
  • Google has no immediate plans to either expand or contract this resourcing.

The repercussions of which are:

  • When the only other reviewers than myself were part of the project, I couldn’t get reviews done in a timely manner – it would take days if not weeks, severely curtailing any feature work I was looking to accomplish.
  • When GKE autopilot was consistently failing (or really any kind of infrastructure maintenance issue), it took several days to communicate this to the community and still several more days to resolve as only people from Google had access and visibility to the project.
  • xWF employees have full access to the Github repository and project, and can be arbitrary about how they interact with Agones, which has resulted in some less than ideal scenarios including: not communicating with the community when making decisions on issues, modifying releases (which broke releases), making releases without checking to see if there is in-flight feature or bug work, etc.

As a note - I don't mean any of the above as any kind of negative implications in terms of skill, ability or desire to those from Google Cloud and its xWF folks that we interact with on this project. I'm strongly willing to bet they are doing a herculean task to even do as much work as they do. I expect this is Google Cloud leadership, support and prioritisation issue more than anything else that is causing these issues.

Goals

The ultimate goal is to ensure the project can continue to grow, be successful and gain adoption, but to do that I want to find a set of goals that works for both Google’s and the community’s priorities, incentives and bandwidth.

  • Grow the set of approvers and maintainers and across multiple studios and game backend providers that enables redundancy between committee members and reduces single points of failure as much as possible.
  • Increase speed of review of contributions.
  • Improve community engagement in Slack, community meetings and other social media spaces.
  • Have project infrastructure that is able to be shared across said TSC to enable redundancy and support for Google staff (FTW or xWF) that are maintaining it.
  • Create a process between Google staff (FTE or xWF) and the non-Google community for coordination of releases with upcoming feature and bug fixes.
  • Create a stronger communication and support pipeline between the community and Google staff, particularly with the xWF members.

Proposal

The proposal is the following:

  • Move Agones into the Linux Foundation, and go through its onboarding process (see open questions below for what that will look like).
  • Create a new more formal organisational structure (I’ll call this the “Technical Steering Committee” for now) that has membership from both Google and various studios and external contributors, with defined roles and responsibilities.
    • This also provides more incentive for long term contribution as it can be a resume building tool (“i.e. Technical Steering Committee member on Agones” sounds good 😀)
  • Move Agones out of the googleforgames organisation into its own “agones” organisation (looks like https://github.com/agones is taken - but we can work that out later) - which the TSC will have admin rights to (and can also participate in maintaining).
  • Google to maintain the project infrastructure investment - ensuring that Google Cloud infrastructure is the primary place that Agones gets tested and developed on.
  • Google also maintains KTLO efforts (K8s version upgrades, releases, Go versions, dependencies etc) so they can also maintain their commitment to their customers, and maintain a level of knowledge within the project for when customer and support requests come in, but also update the process definitions so that the community can also engage on when releases go out (i.e. waiting on specific bug fixes or feature work), and can also advise of process improvement, automation, etc.
  • Google Cloud project infrastructure credentials are shared with the technical steering committee, such that the TSC can also be involved with bug fixing, maintenance, performance testing etc as needed.
  • Agones feature work continues to be worked on by the non-Google community.

Why a Foundation?

  • Lower the barrier to community growth & collaboration, especially across companies that otherwise would have not contributed due to the Google CLA or overall ownership.
  • Remove the assumption that “Google will just do this work”, if the project no longer lives under a google owned GitHub organisation.

Why the Linux Foundation

  • The Linux Foundation is a known neutral space for open source projects to live - including a CLAs that many companies have already signed.
  • Google has donated multiple projects to the Linux Foundation previously, so this is a known quantity and well understood path for Google.
  • The Linux Foundation has a known ability to market the open source projects under its purview, and Agones aligns itself and utilises with many of its other projects (Kubernetes, gRPC, etc).
  • Structured governance & processes - the LInux Foundation has many templates and examples for governance and organisation that we can lean on, such as leadership selection, community guidelines, and more.

Questions

The following are questions that require investigation that I felt wasn't worth doing until we had but in from Google, or I don't have answers for right now.

  • Does the Linux Foundation want us?
    • Past conversations with the LF had indicated as such.
  • Which part of the Linux Foundation is the right fit for this project? CNCF? O3DE? ASWF?
    • I expect this would be an open discussion with LF to determine the best place.
  • Do we need to make any changes to Agones or its processes to be eligible to join the LF or any of its sub-foundations?
    • For example, are there organisational structures that need to be in place, etc (I see this as a positive, as we could probably use more higher level organisation to reach the next level of project maturity)..

Alternatives

We leave things as they are

If I had to predict what will happen in this scenario, it is that maintainers, including myself, will start to leave the project out of frustration and burnout, since lack of control, and support are leading causes of burnout.

To be honest, I'm already personally feeling this, after multiple emergencies outside of my control, frustration on getting reviews for work done and a general feeling of lack of support from Google. In this world, the project likely enters a slow death as the project has little to no leadership or community support.

Fork the project into a Foundation (Linux or otherwise)

This is definitely a viable option if Google is not amenable to migrating this project to the Linux Foundation. Tradeoffs to consider so include:

  • We would have to rename the project and create a new logo, since Google owns the trademark for “Agones”.
  • Having to rebuild all the CI infrastructure (most likely on GitHub Actions), since we no longer would have access to Google Cloud resources
  • Rebuild the site with the new name and logo, use different package hosting (Github again - container registry), helm chart hosting (GitHub pages, or in theory container registry), etc.
  • Would have to consider dropping GKE Autopilot support since we would no longer be able to test it as part of our e2e testing.

Basically, this is a tonne of work. I’d only consider this as a last resort, but it is an option.

Final Thoughts

To Google - my question is: If myself and other maintainers drop from the project due to ongoing frustration over the current situation, how will you meet customer obligations for the support agreements you have in place with Platinum Sponsors and/or maintain positive relationships with existing customers using this project?

To users and contributors past and present, please share your thoughts on the above – have you felt similar pains? Does the proposed solution sound like an appropriate fix to the challenges we've been having? Please share any thoughts, improvements, feedback or really anything on this ticket.

Ultimately, I believe the above is a solid win-win for everyone involved, as Google gets to benefit from community feature work and engagement and the community continues to benefit from the ongoing infrastructure and KTLO engagement from Google, but with higher cross communication and redundancy across both.

That all being said - definitely would love feedback if anyone identifies anything to the contrary, or if any of this particularly resonates – or anything in between.

Metadata

Metadata

Labels

area/metaOrganisational matters. e.g. Governance, release cycles, etc.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions