-
Notifications
You must be signed in to change notification settings - Fork 41
Description
What you would like to be added?
We’d like to propose an “options” style API for the Kubeflow Trainer SDK that lets users customize a TrainJob without surfacing the full Kubernetes surface and writing YAML files.
The idea is to add an optional options parameter to train() inspired by golang style package options where each option is a small callable that mutates the TrainJob before submit.
This gives data scientists a simple way to do common tasks like adding labels and annotations, enabling Kueue, attaching per‑Pod template overrides:
from kubeflow.trainer import TrainerClient, CustomTrainer
from kubeflow.trainer.options import (
WithLabels,
WithAnnotations,
WithKueue,
WithPodSpecOverrides,
)
def MyCustomOption(service_account: str):
def apply(job: dict):
spec = job.setdefault("spec", {})
spec.setdefault("podSpecOverrides", []).append({"serviceAccountName": service_account})
return apply
job_id = TrainerClient().train(
runtime=runtime,
trainer=CustomTrainer(
func=train,
packages_to_install=["transformers", "boto3"],
),
options=[
WithLabels({"project": "training", "owner": "platform-team"}),
WithAnnotations({"version": "v1"}),
WithKueue(
queue="ml-queue",
topology_labels={"kueue.x-k8s.io/podset-name": "node"},
managed_by="kueue.x-k8s.io/multikueue",
),
WithPodSpecOverrides([
{
"targetJobs": ["node"],
"labels": {"workload": "training"},
"containers": [
{"name": "node", "volumeMounts": [{"name": "data", "mountPath": "/workspace/data"}]}
],
"volumes": [
{"name": "data", "persistentVolumeClaim": {"claimName": "datasets-pvc"}}
],
}
]),
MyCustomOption("trainer-sa"),
],
)Why is this needed?
This addresses the current gaps we have such as setting labels/annotations from the SDK, using Kueue, and mounting volumes/volumeMounts.
The API would stay the same by default; if a user doesn’t pass options, behavior is unchanged.
Love this feature?
Give it a 👍 We prioritize the features with most 👍