The RoleBasedGroup API

RoleBasedGroup: An API for for orchestrating distributed workload services with multi-role collaboration and automated service discovery. It aims to address common deployment patterns of AI/ML inference workloads, especially Prefill/Decode engine disaggregation workloads (e.g. a prefill, decode, scheduler, etc.) where the LLM will be sharded and run across multiple devices on multiple nodes.

📖 Overview

Background

Traditional Kubernetes statefulset struggle with multi-role coordination in distributed stateful service scenarios. This solution addresses:

Startup order dependencies between roles
Complex cross-role service discovery
Fragmented configuration management

🧩 Key Features

✨ Multi-role Template Spec - Model distributed stateful workloads as unified K8s workload groups.
🔗 Role-based Startup Control - Establish role dependencies and startup sequence for ReplicatedJobs in a RoleBasedGroup.
🔍 Auto Service Discovery - Inject topology details via configs and env vars.
⚡ Elastic Scaling - Enable group/role-level scaling operations.
🔄 Atomic Rollout - Role-level rollout/update: Upgrade entire Roles sequentially as single units (all pods in the same role updated simultaneously).
🌐 Topology-aware Placement - Guarantee co-location of group/role pods within the same topology domain.
🛑 Atomic Failure Recovery - Trigger full role recreation if any pod/container fails within the same group/role.
🔧 Customizable Workload - Support for multiple workload types (e.g. StatefulSet, Deployment, etc.) for the role.

🏗 Conceptual Diagram

🚀 Quick Start

Install Controller

helm install rbgs deploy/helm/rbgs -n rbgs-system --create-namespace

Minimal Example

kubectl apply -f examples/base/rbg.yaml

📚 API Documentation

Key Fields

Field	Type	Description
`startupPolicy`	string	Startup strategy (Ordered/Parallel)
`dependencies`	[]string	Role dependencies list
`workload`	Object	Underlying workload type (default: StatefulSet)

Full API spec: API_REFERENCE.md

🤝 Contributing

We welcome contributions through issues and PRs! See CONTRIBUTING.md

Community, discussion, contribution, and support

Learn how to engage with the Kubernetes community on the community page.

You can reach the maintainers of this project at:

Code of conduct

Participation in the Kubernetes community is governed by the Kubernetes Code of Conduct.

Name		Name	Last commit message	Last commit date
Latest commit History 158 Commits
api/workloads/v1alpha1		api/workloads/v1alpha1
cmd		cmd
config		config
deploy/helm/rbgs		deploy/helm/rbgs
doc		doc
examples		examples
hack		hack
internal/controller/workloads		internal/controller/workloads
pkg		pkg
test		test
tools/crd-upgrade		tools/crd-upgrade
version		version
.dockerignore		.dockerignore
.gitignore		.gitignore
.golangci.yml		.golangci.yml
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
OWNERS		OWNERS
OWNERS_ALIASES		OWNERS_ALIASES
PROJECT		PROJECT
README.md		README.md
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

The RoleBasedGroup API

📖 Overview

Background

🧩 Key Features

🏗 Conceptual Diagram

🚀 Quick Start

Install Controller

Minimal Example

📚 API Documentation

Key Fields

🤝 Contributing

Community, discussion, contribution, and support

Code of conduct

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

License

AliyunContainerService/rolebasedgroup

Folders and files

Latest commit

History

Repository files navigation

The RoleBasedGroup API

📖 Overview

Background

🧩 Key Features

🏗 Conceptual Diagram

🚀 Quick Start

Install Controller

Minimal Example

📚 API Documentation

Key Fields

🤝 Contributing

Community, discussion, contribution, and support

Code of conduct

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages