Skip to content

Conversation

saileshd1402
Copy link
Contributor

@saileshd1402 saileshd1402 commented Mar 18, 2025

What this PR does / why we need it:
Update Manifest images to GHCR instead of Dockerhub. Required to decrease rate limiting issues

Which issue(s) this PR fixes (optional, in Fixes #<issue number>, #<issue number>, ... format, will close the issue(s) when PR gets merged):

Address this comment to update manifest images: #2491 (comment)

cc: @andreyvelich @Electronic-Waste @mahdikhashan @tenzen-y

Checklist:

  • Docs included if any changes are user facing

Signed-off-by: sailesh duddupudi <saileshradar@gmail.com>
Copy link
Member

@andreyvelich andreyvelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might also need to update other places:
e.g. https://github.com/kubeflow/trainer/blob/release-1.9/pkg/config/config.go#L35

@kubeflow/wg-training-leads @Electronic-Waste What other places where we use our images ?

@saileshd1402
Copy link
Contributor Author

We might also need to update other places: e.g. https://github.com/kubeflow/trainer/blob/release-1.9/pkg/config/config.go#L35

Got it, I'll search and update

Signed-off-by: sailesh duddupudi <saileshradar@gmail.com>
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@google-oss-prow google-oss-prow bot added size/M and removed size/XS labels Mar 18, 2025
@saileshd1402
Copy link
Contributor Author

I've updated the images, PTAL

PyTorchInitContainerMaxTriesDefault = 100
// MPIKubectlDeliveryImageDefault is the default image for launcher pod in MPIJob init container.
MPIKubectlDeliveryImageDefault = "kubeflow/kubectl-delivery:latest"
MPIKubectlDeliveryImageDefault = "ghcr.io/kubeflow/training/kubectl-delivery:latest"
Copy link
Contributor Author

@saileshd1402 saileshd1402 Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this change is in pkg folder, should we update image tag in overlay again in another PR to reflect this change?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to update tag in overlay after we make a release.
Here is the process: https://github.com/kubeflow/trainer/tree/release-1.9/docs/release#release-branches-and-tags

Makefile Outdated
@@ -1,5 +1,5 @@
# Image URL to use all building/pushing image targets
IMG ?= kubeflow/training-operator:latest
IMG ?= ghcr.io/kubeflow/training/training-operator:latest
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
IMG ?= ghcr.io/kubeflow/training/training-operator:latest
IMG ?= ghcr.io/kubeflow/trainer/training-operator:latest

Isn't training misstype?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's an error that also happened in #2491.

Maybe we should also change the image name in:

image: |
docker.io/kubeflow/${{ inputs.component-name }}
ghcr.io/kubeflow/training/${{ inputs.component-name }}

image: |
docker.io/kubeflow/${{ inputs.component-name }}
ghcr.io/kubeflow/training/${{ inputs.component-name }}

WDYT? @saileshd1402 @andreyvelich

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so too

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andreyvelich We may also need to delete the training/* image in the github package of trainer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the error in: #2546

@coveralls
Copy link

coveralls commented Mar 19, 2025

Coverage Status

coverage: 100.0%. remained the same
when pulling f6294f0 on saileshd1402:update-manifest-images-to-ghcr
into 5840e81 on kubeflow:release-1.9.

Signed-off-by: sailesh duddupudi <saileshradar@gmail.com>
Copy link
Member

@Electronic-Waste Electronic-Waste left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @saileshd1402. LGTM!

/lgtm

@mahdikhashan
Copy link
Member

/lgtm

thank you for your time

@saileshd1402
Copy link
Contributor Author

saileshd1402 commented Mar 19, 2025

@andreyvelich @Electronic-Waste do you think we can keep these changes or move to a different prefix? The other one discussed: /kubeflow/training-v1 also sounds good to me. I'm not in favour of /kubeflow/training-operator

@andreyvelich
Copy link
Member

andreyvelich commented Mar 19, 2025

@saileshd1402 Let's keep the training-v1 prefix to be more precise.
Even that we have v1- in the tag of the images, I think having the prefix would be better for users.

@Electronic-Waste
Copy link
Member

@saileshd1402 Let's keep the training-v1 prefix to be more precise.
Even that we have v1- in the tag of the images, I think having the prefix would be better for users.

+1

@tenzen-y
Copy link
Member

@saileshd1402 Let's keep the training-v1 prefix to be more precise. Even that we have v1- in the tag of the images, I think having the prefix would be better for users.

SGTM

@andreyvelich
Copy link
Member

@saileshd1402 Please could you update this PR with the /training-v1/ prefix, so I can cut the release ?

Signed-off-by: sailesh duddupudi <saileshradar@gmail.com>
@google-oss-prow google-oss-prow bot removed the lgtm label Mar 20, 2025
- name: kubeflow/training-operator
newTag: v1-5170a36
- name: ghcr.io/kubeflow/training-v1/training-operator
newTag: v1-f654b1e
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will create a followup PR to fix this tag

@andreyvelich
Copy link
Member

Thank you for the update @saileshd1402!
/lgtm
/assign @Electronic-Waste @tenzen-y @astefanutti @kubeflow/wg-training-leads @mahdikhashan

@Electronic-Waste
Copy link
Member

Thanks for this @saileshd1402!

/lgtm

Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you update

kustomize edit set image kubeflow/training-operator=${REGISTRY}:${VERSION}
as well?

Signed-off-by: sailesh duddupudi <saileshradar@gmail.com>
@google-oss-prow
Copy link

New changes are detected. LGTM label has been removed.

@google-oss-prow google-oss-prow bot removed the lgtm label Mar 21, 2025
@google-oss-prow
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Electronic-Waste
Once this PR has been reviewed and has the lgtm label, please ask for approval from andreyvelich. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@saileshd1402
Copy link
Contributor Author

Could you update

kustomize edit set image kubeflow/training-operator=${REGISTRY}:${VERSION}

as well?

Didn't notice that, thank you! updated

@mahdikhashan
Copy link
Member

i wonder katib images should be called from ghcr?

examples/pytorch/simple.yaml:              image: docker.io/kubeflowkatib/pytorch-mnist:v1beta1-45c5727
examples/pytorch/simple.yaml:              image: docker.io/kubeflowkatib/pytorch-mnist:v1beta1-45c5727
pkg/webhooks/pytorch/pytorchjob_webhook_test.go:                                                Image:           "docker.io/kubeflowkatib/pytorch-mnist:v1beta1-45c5727",
pkg/webhooks/pytorch/pytorchjob_webhook_test.go:                                                Image:           "docker.io/kubeflowkatib/pytorch-mnist:v1beta1-45c5727",

ref: kubeflow/katib#2529

@andreyvelich
Copy link
Member

Do we want to merge this PR without e2e pass or update the example images in the followup PR?

@saileshd1402
Copy link
Contributor Author

In the interest of time I would want to merge this. But either way works, please let me know

@andreyvelich andreyvelich merged commit 31da2d6 into kubeflow:release-1.9 Mar 24, 2025
34 of 53 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants