-
Notifications
You must be signed in to change notification settings - Fork 834
Description
What you would like to be added?
Enhance the validation of the name of TrainJob.
Why is this needed?
This is the follow-up issue of #2734. As discussed in the comment, #2734 restricts the name to be compliant with RFC 1035, including the restriction of characters and length. This issue is raised to discuss whether stricter restrictions need to be imposed on the name of TrainJob and other related fields.
For Kubeflow Trainer, JobSet validates the associated fields, whether they are compliant with RFC 1035 in its validation webhook:
- Name of JobSet (Same as the name of TrainJob)
- Name of each ReplicaJob (
data-initializer
,node
, etc.) - Generated Pod name (
<jobset-name>-<replicatedJobName>-<jobIndex>-<podIndex>.<subdomain>
)
JobSet cannot be created with an invalid name, and the status of TrainJob would remain empty (I think it's the desired behavior after #2621), with TrainJobResourcesCreationFailed
event recorded.
It would be great if we could clarify to what extent we need to impose restrictions, since it overlaps somewhat with the validation of JobSet.
Example:
apiVersion: trainer.kubeflow.org/v1alpha1
kind: ClusterTrainingRuntime
metadata:
name: test
spec:
template:
spec:
replicatedJobs:
- name: Exporter
template:
spec:
template:
spec:
containers:
- name: exporter
image: alpine:latest
command:
- sh
- -c
args:
- |
ls /
---
apiVersion: trainer.kubeflow.org/v1alpha1
kind: TrainJob
metadata:
name: test
spec:
runtimeRef:
name: test
{"level":"error","ts":"2025-07-15T15:26:58.017501223Z","caller":"controller/controller.go:341","msg":"Reconciler error","controller":"trainjob_controller","namespace":"default","name":"test","reconcileID":"48a235de-e80e-411c-b780-ca880d9318a0","error":"admission webhook \"validate-jobset-x-k8s-io-v1alpha2-jobset.x-k8s.io\" denied the request: a DNS-1035 label must consist of lower case alphanumeric characters or '-', start with an alphabetic character, and end with an alphanumeric character (e.g. 'my-name', or 'abc-123', regex used for validation is '[a-z]([-a-z0-9]*[a-z0-9])?')\na DNS-1035 label must consist of lower case alphanumeric characters or '-', start with an alphabetic character, and end with an alphanumeric character (e.g. 'my-name', or 'abc-123', regex used for validation is '[a-z]([-a-z0-9]*[a-z0-9])?')\nJobSet.jobset.x-k8s.io \"test\" not found","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.20.2/pkg/internal/controller/controller.go:341\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.20.2/pkg/internal/controller/controller.go:288\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.20.2/pkg/internal/controller/controller.go:249"}
/cc @andreyvelich @tenzen-y @Electronic-Waste
Love this feature?
Give it a 👍 We prioritize the features with most 👍