Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Image URL to use all building/pushing image targets
IMG ?= kubeflow/training-operator:latest
IMG ?= ghcr.io/kubeflow/training/training-operator:latest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
IMG ?= ghcr.io/kubeflow/training/training-operator:latest
IMG ?= ghcr.io/kubeflow/trainer/training-operator:latest

Isn't training misstype?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's an error that also happened in #2491.

Maybe we should also change the image name in:

image: |
docker.io/kubeflow/${{ inputs.component-name }}
ghcr.io/kubeflow/training/${{ inputs.component-name }}

image: |
docker.io/kubeflow/${{ inputs.component-name }}
ghcr.io/kubeflow/training/${{ inputs.component-name }}

WDYT? @saileshd1402 @andreyvelich

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so too

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andreyvelich We may also need to delete the training/* image in the github package of trainer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the error in: #2546

# CRD generation options
CRD_OPTIONS ?= "crd:generateEmbeddedObjectMeta=true,maxDescLen=400"

Expand Down
2 changes: 1 addition & 1 deletion examples/jax/cpu-demo/demo.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ spec:
spec:
containers:
- name: jax
image: docker.io/kubeflow/jaxjob-simple:latest
image: ghcr.io/kubeflow/training/jaxjob-simple:latest
command:
- "python3"
- "train.py"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,5 +12,5 @@ spec:
spec:
containers:
- name: jax
image: docker.io/kubeflow/jaxjob-dist-spmd-mnist:latest
image: ghcr.io/kubeflow/training/jaxjob-dist-spmd-mnist:latest
imagePullPolicy: Always
4 changes: 2 additions & 2 deletions examples/pytorch/deepspeed-demo/pytorch_deepspeed_demo.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ spec:
spec:
containers:
- name: pytorch
image: kubeflow/pytorch-deepspeed-demo:latest
image: ghcr.io/kubeflow/training/pytorch-deepspeed-demo:latest
command:
- torchrun
- /train_bert_ds.py
Expand All @@ -27,7 +27,7 @@ spec:
spec:
containers:
- name: pytorch
image: kubeflow/pytorch-deepspeed-demo:latest
image: ghcr.io/kubeflow/training/pytorch-deepspeed-demo:latest
command:
- torchrun
- /train_bert_ds.py
Expand Down
2 changes: 1 addition & 1 deletion examples/pytorch/elastic/echo/echo.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ spec:
spec:
containers:
- name: pytorch
image: kubeflow/pytorch-elastic-example-echo:latest
image: ghcr.io/kubeflow/training/pytorch-elastic-example-echo:latest
imagePullPolicy: IfNotPresent
env:
- name: LOGLEVEL
Expand Down
2 changes: 1 addition & 1 deletion examples/pytorch/elastic/imagenet/imagenet.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ spec:
spec:
containers:
- name: pytorch
image: kubeflow/pytorch-elastic-example-imagenet:latest
image: ghcr.io/kubeflow/training/pytorch-elastic-example-imagenet:latest
imagePullPolicy: IfNotPresent
resources:
requests:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,7 @@
"\n",
"container = V1Container(\n",
" name=container_name,\n",
" image=\"kubeflow/pytorch-dist-mnist:latest\",\n",
" image=\"ghcr.io/kubeflow/training/pytorch-dist-mnist:latest\",\n",
" args=[\"--backend\", \"gloo\"],\n",
")\n",
"\n",
Expand Down
4 changes: 2 additions & 2 deletions examples/pytorch/mnist/v1/pytorch_job_mnist_gloo.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ spec:
spec:
containers:
- name: pytorch
image: kubeflow/pytorch-dist-mnist:latest
image: ghcr.io/kubeflow/training/pytorch-dist-mnist:latest
args: ["--backend", "gloo"]
# Comment out the below resources to use the CPU.
resources:
Expand All @@ -24,7 +24,7 @@ spec:
spec:
containers:
- name: pytorch
image: kubeflow/pytorch-dist-mnist:latest
image: ghcr.io/kubeflow/training/pytorch-dist-mnist:latest
args: ["--backend", "gloo"]
# Comment out the below resources to use the CPU.
resources:
Expand Down
4 changes: 2 additions & 2 deletions examples/pytorch/mnist/v1/pytorch_job_mnist_mpi.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ spec:
spec:
containers:
- name: pytorch
image: kubeflow/pytorch-dist-mnist:latest
image: ghcr.io/kubeflow/training/pytorch-dist-mnist:latest
args: ["--backend", "mpi"]
# Comment out the below resources to use the CPU.
resources:
Expand All @@ -24,7 +24,7 @@ spec:
spec:
containers:
- name: pytorch
image: kubeflow/pytorch-dist-mnist:latest
image: ghcr.io/kubeflow/training/pytorch-dist-mnist:latest
args: ["--backend", "mpi"]
# Comment out the below resources to use the CPU.
resources:
Expand Down
4 changes: 2 additions & 2 deletions examples/pytorch/mnist/v1/pytorch_job_mnist_nccl.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ spec:
spec:
containers:
- name: pytorch
image: kubeflow/pytorch-dist-mnist:latest
image: ghcr.io/kubeflow/training/pytorch-dist-mnist:latest
args: ["--backend", "nccl"]
resources:
limits:
Expand All @@ -23,7 +23,7 @@ spec:
spec:
containers:
- name: pytorch
image: kubeflow/pytorch-dist-mnist:latest
image: ghcr.io/kubeflow/training/pytorch-dist-mnist:latest
args: ["--backend", "nccl"]
resources:
limits:
Expand Down
4 changes: 2 additions & 2 deletions examples/pytorch/smoke-dist/pytorch_job_sendrecv.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,12 @@ spec:
spec:
containers:
- name: pytorch
image: kubeflow/pytorch-dist-sendrecv-test:latest
image: ghcr.io/kubeflow/training/pytorch-dist-sendrecv-test:latest
Worker:
replicas: 3
restartPolicy: OnFailure
template:
spec:
containers:
- name: pytorch
image: kubeflow/pytorch-dist-sendrecv-test:latest
image: ghcr.io/kubeflow/training/pytorch-dist-sendrecv-test:latest
6 changes: 3 additions & 3 deletions examples/tensorflow/dist-mnist/tf_job_mnist.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ spec:
spec:
containers:
- name: tensorflow
image: kubeflow/tf-dist-mnist-test:latest
image: ghcr.io/kubeflow/training/tf-dist-mnist-test:latest

PS:
replicas: 1
Expand All @@ -20,7 +20,7 @@ spec:
spec:
containers:
- name: tensorflow
image: kubeflow/tf-dist-mnist-test:latest
image: ghcr.io/kubeflow/training/tf-dist-mnist-test:latest

Worker:
replicas: 2
Expand All @@ -29,4 +29,4 @@ spec:
spec:
containers:
- name: tensorflow
image: kubeflow/tf-dist-mnist-test:latest
image: ghcr.io/kubeflow/training/tf-dist-mnist-test:latest
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ spec:
spec:
containers:
- name: tensorflow
image: kubeflow/tf-multi-worker-strategy:latest
image: ghcr.io/kubeflow/training/tf-multi-worker-strategy:latest
volumeMounts:
- mountPath: /train
name: training
Expand Down
2 changes: 1 addition & 1 deletion examples/tensorflow/mnist_with_summaries/tf_job_mnist.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ spec:
spec:
containers:
- name: tensorflow
image: kubeflow/tf-mnist-with-summaries:latest
image: ghcr.io/kubeflow/training/tf-mnist-with-summaries:latest
command:
- "python"
- "/var/tf_mnist/mnist_with_summaries.py"
Expand Down
2 changes: 1 addition & 1 deletion examples/tensorflow/simple.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ spec:
spec:
containers:
- name: tensorflow
image: kubeflow/tf-mnist-with-summaries:latest
image: ghcr.io/kubeflow/training/tf-mnist-with-summaries:latest
command:
- "python"
- "/var/tf_mnist/mnist_with_summaries.py"
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ spec:
spec:
containers:
- name: xgboost
image: kubeflow/lightgbm-dist-py-test:1.0
image: ghcr.io/kubeflow/training/lightgbm-dist-py-test:latest
ports:
- containerPort: 9991
name: xgboostjob-port
Expand Down Expand Up @@ -45,7 +45,7 @@ spec:
spec:
containers:
- name: xgboost
image: kubeflow/lightgbm-dist-py-test:1.0
image: ghcr.io/kubeflow/training/lightgbm-dist-py-test:latest
ports:
- containerPort: 9991
name: xgboostjob-port
Expand Down
4 changes: 2 additions & 2 deletions examples/xgboost/smoke-dist/xgboostjob_v1_rabit_test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ spec:
spec:
containers:
- name: xgboost
image: docker.io/kubeflow/xgboost-dist-rabit-test:latest
image: ghcr.io/kubeflow/training/xgboost-dist-rabit-test:latest
ports:
- containerPort: 9991
name: xgboostjob-port
Expand All @@ -23,7 +23,7 @@ spec:
spec:
containers:
- name: xgboost
image: docker.io/kubeflow/xgboost-dist-rabit-test:latest
image: ghcr.io/kubeflow/training/xgboost-dist-rabit-test:latest
ports:
- containerPort: 9991
name: xgboostjob-port
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ spec:
spec:
containers:
- name: xgboost
image: docker.io/kubeflow/xgboost-dist-rabit-test:latest
image: ghcr.io/kubeflow/training/xgboost-dist-rabit-test:latest
ports:
- containerPort: 9991
name: xgboostjob-port
Expand All @@ -27,7 +27,7 @@ spec:
spec:
containers:
- name: xgboost
image: docker.io/kubeflow/xgboost-dist-rabit-test:latest
image: ghcr.io/kubeflow/training/xgboost-dist-rabit-test:latest
ports:
- containerPort: 9991
name: xgboostjob-port
Expand Down
4 changes: 2 additions & 2 deletions examples/xgboost/xgboost-dist/xgboostjob_v1_iris_predict.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ spec:
spec:
containers:
- name: xgboost
image: docker.io/kubeflow/xgboost-dist-iris:latest
image: ghcr.io/kubeflow/training/xgboost-dist-iris:latest
ports:
- containerPort: 9991
name: xgboostjob-port
Expand All @@ -28,7 +28,7 @@ spec:
spec:
containers:
- name: xgboost
image: docker.io/kubeflow/xgboost-dist-iris:latest
image: ghcr.io/kubeflow/training/xgboost-dist-iris:latest
ports:
- containerPort: 9991
name: xgboostjob-port
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ spec:
claimName: xgboostlocal
containers:
- name: xgboost
image: docker.io/kubeflow/xgboost-dist-iris:latest
image: ghcr.io/kubeflow/training/xgboost-dist-iris:latest
volumeMounts:
- name: task-pv-storage
mountPath: /tmp/xgboost_model
Expand All @@ -38,7 +38,7 @@ spec:
claimName: xgboostlocal
containers:
- name: xgboost
image: docker.io/kubeflow/xgboost-dist-iris:latest
image: ghcr.io/kubeflow/training/xgboost-dist-iris:latest
volumeMounts:
- name: task-pv-storage
mountPath: /tmp/xgboost_model
Expand Down
4 changes: 2 additions & 2 deletions examples/xgboost/xgboost-dist/xgboostjob_v1_iris_train.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ spec:
spec:
containers:
- name: xgboost
image: docker.io/kubeflow/xgboost-dist-iris:latest
image: ghcr.io/kubeflow/training/xgboost-dist-iris:latest
ports:
- containerPort: 9991
name: xgboostjob-port
Expand All @@ -30,7 +30,7 @@ spec:
spec:
containers:
- name: xgboost
image: docker.io/kubeflow/xgboost-dist-iris:latest
image: ghcr.io/kubeflow/training/xgboost-dist-iris:latest
ports:
- containerPort: 9991
name: xgboostjob-port
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ spec:
claimName: xgboostlocal
containers:
- name: xgboost
image: docker.io/kubeflow/xgboost-dist-iris:latest
image: ghcr.io/kubeflow/training/xgboost-dist-iris:latest
volumeMounts:
- name: task-pv-storage
mountPath: /tmp/xgboost_model
Expand All @@ -41,7 +41,7 @@ spec:
claimName: xgboostlocal
containers:
- name: xgboost
image: docker.io/kubeflow/xgboost-dist-iris:latest
image: ghcr.io/kubeflow/training/xgboost-dist-iris:latest
volumeMounts:
- name: task-pv-storage
mountPath: /tmp/xgboost_model
Expand Down
4 changes: 2 additions & 2 deletions examples/xgboost/xgboostjob.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ spec:
spec:
containers:
- name: xgboost
image: docker.io/kubeflow/xgboost-dist-iris:latest
image: ghcr.io/kubeflow/training/xgboost-dist-iris:latest
ports:
- containerPort: 9991
name: xgboostjob-port
Expand All @@ -30,7 +30,7 @@ spec:
spec:
containers:
- name: xgboost
image: docker.io/kubeflow/xgboost-dist-iris:latest
image: ghcr.io/kubeflow/training/xgboost-dist-iris:latest
ports:
- containerPort: 9991
name: xgboostjob-port
Expand Down
4 changes: 2 additions & 2 deletions manifests/overlays/kubeflow/kustomization.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@ resources:
- ../../base
- kubeflow-training-roles.yaml
images:
- name: kubeflow/training-operator
newTag: v1-5170a36
- name: ghcr.io/kubeflow/training/training-operator
newTag: v1-f654b1e
# TODO (tenzen-y): Once we support cert-manager, we need to remove this secret generation.
# REF: https://github.com/kubeflow/training-operator/issues/2049
secretGenerator:
Expand Down
4 changes: 2 additions & 2 deletions manifests/overlays/standalone/kustomization.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@ resources:
- ../../base
- namespace.yaml
images:
- name: kubeflow/training-operator
newTag: v1-5170a36
- name: ghcr.io/kubeflow/training/training-operator
newTag: v1-f654b1e
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will create a followup PR to fix this tag

secretGenerator:
- name: training-operator-webhook-cert
options:
Expand Down
2 changes: 1 addition & 1 deletion pkg/config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -32,5 +32,5 @@ const (
// PyTorchInitContainerMaxTriesDefault is the default number of tries for the pytorch init container.
PyTorchInitContainerMaxTriesDefault = 100
// MPIKubectlDeliveryImageDefault is the default image for launcher pod in MPIJob init container.
MPIKubectlDeliveryImageDefault = "kubeflow/kubectl-delivery:latest"
MPIKubectlDeliveryImageDefault = "ghcr.io/kubeflow/training/kubectl-delivery:latest"
Copy link
Contributor Author

@saileshd1402 saileshd1402 Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this change is in pkg folder, should we update image tag in overlay again in another PR to reflect this change?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to update tag in overlay after we make a release.
Here is the process: https://github.com/kubeflow/trainer/tree/release-1.9/docs/release#release-branches-and-tags

)
6 changes: 3 additions & 3 deletions sdk/python/kubeflow/training/constants/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@

# TODO (andreyvelich): We should add image tag for Storage Initializer and Trainer.
STORAGE_INITIALIZER_IMAGE = os.getenv(
"STORAGE_INITIALIZER_IMAGE", "docker.io/kubeflow/storage-initializer"
"STORAGE_INITIALIZER_IMAGE", "ghcr.io/kubeflow/training/storage-initializer"
)

STORAGE_INITIALIZER_VOLUME_MOUNT = models.V1VolumeMount(
Expand All @@ -93,7 +93,7 @@
)

TRAINER_TRANSFORMER_IMAGE = os.getenv(
"TRAINER_TRANSFORMER_IMAGE", "docker.io/kubeflow/trainer-huggingface"
"TRAINER_TRANSFORMER_IMAGE", "ghcr.io/kubeflow/training/trainer-huggingface"
)

# TFJob constants.
Expand Down Expand Up @@ -153,7 +153,7 @@
JAXJOB_PLURAL = "jaxjobs"
JAXJOB_CONTAINER = "jax"
JAXJOB_REPLICA_TYPES = REPLICA_TYPE_WORKER.lower()
JAXJOB_BASE_IMAGE = "docker.io/kubeflow/jaxjob-dist-spmd-mnist:latest"
JAXJOB_BASE_IMAGE = "ghcr.io/kubeflow/training/jaxjob-dist-spmd-mnist:latest"

# Dictionary to get plural, model, and container for each Job kind.
JOB_PARAMETERS = {
Expand Down
Loading