fix-ml-pipeline-ui-mtls #3236

jholt96 · 2025-09-06T20:42:05Z

Pull Request Template for Kubeflow Manifests

✏️ Summary of Changes

This change fixes artifact retrieval in Kubeflow pipelines when the default sidecar is enabled. With the first default sidecar configuration, it blocks the ml-pipeline-ui pod from being able to proxy artifact requests from users that go to the ml-pipeline-ui-artifact pods in profile created namespaces. This is because in kubeflow 1.9+, the authorizationpolicy created in the namespace limits requests based on the principle and the SA of the ml-pipeline-ui pod. Without this change, users get “RBAC: Access Denied”

📦 Dependencies

None

🐛 Related Issues

Link any issues that are resolved or affected by this PR.

✅ Contributor Checklist

I have tested these changes with kustomize. See Installation Prerequisites.
All commits are signed-off to satisfy the DCO check.
I have considered adding my company to the adopters page to support Kubeflow and help the community, since I expect help from the community for my issue (see 1. and 2.).

You can join the CNCF Slack and access our meetings at the Kubeflow Community website. Our channel on the CNCF Slack is here #kubeflow-platform.

droctothorpe · 2025-09-08T15:16:55Z

Thanks for this contribution, @jholt96!

/approve
/lgtm

CC @juliusvonkohout

juliusvonkohout · 2025-09-09T12:07:03Z

@jholt96 ml-pipeline-ui-artifact has been removed and you now have proper zero overhead namespaces. It is already merged in Kubeflow/pipelines and soon the default also in Kubeflow/manifests. https://medium.com/@hpotpose26/kubeflow-pipelines-embraces-seaweedfs-9a7e022d5571. See

manifests/.github/workflows/full_kubeflow_integration_test.yaml

Line 69 in 5298caa

- name: Install Pipelines with SeaweedFS

. What we actually need is to extend our tests to also fetch artifacts in

manifests/.github/workflows/full_kubeflow_integration_test.yaml

Line 163 in 5298caa

python3 tests/pipeline_v2_test.py run_pipeline "${TOKEN}" "${KF_PROFILE}"

. Can you adjust the python file to also fetch artifacts from the run? CC @akagami-harsh for helping out in a separate PR against the master branch of kubeflow/manifests and ping me. Just get main.logs or so and print it in the terminal. @droctothorpe we need a KFP release from the KFP master branch with seaweedfs. Then we synchronize and it is done. It is a large change so some small friction is expected.

droctothorpe · 2025-09-09T12:38:14Z

@juliusvonkohout in the seaweed implementation, how does the FE access artifacts in external object stores that only end user namespaces have access to?

jholt96 · 2025-09-09T13:03:56Z

@juliusvonkohout Happy to help with the test, and the zero overhead namespaces sound awesome.

Does that mean the ml-pipeline-ui pods no longer initiate any egress requests to any pods in user namespaces? If it still does then this would be necessary to ensure authpolicies work properly.
For those that have not upgraded and are even on the latest, I think this change is still good to commit to this repo as this issue will exist until they migrate to seaweedfs. I can move it to an overlay directory if that makes more sense to you.

IMO it is worth it to at least document as this issue is hard to diagnose if people are unaware of the sidecar object implementation and what that means for istio mtls.

juliusvonkohout · 2025-09-09T13:25:14Z

@juliusvonkohout Happy to help with the test, and the zero overhead namespaces sound awesome.
1. Does that mean the ml-pipeline-ui pods no longer initiate any egress requests to any pods in user namespaces? If it still does then this would be necessary to ensure authpolicies work properly.

No, they do not need access anymore and with 500 users/namespaces y on your Kubeflow cluster you can save terabytes of memory for istio sidecars and 1000 (500x2) pods if you use the master branch with seaweedfs

2. For those that have not upgraded and are even on the latest, I think this change is still good to commit to this repo as this issue will exist until they migrate to seaweedfs. I can move it to an overlay directory if that makes more sense to you.
IMO it is worth it to at least document as this issue is hard to diagnose if people are unaware of the sidecar object implementation and what that means for istio mtls.

We can make seaweedfs the default here as i did with one commit here.

juliusvonkohout · 2025-09-09T13:38:35Z

@juliusvonkohout in the seaweed implementation, how does the FE access artifacts in external object stores that only end user namespaces have access to?

I do not know what you mean by FE, but i will try to answer the general question.

The s3 storage, in the Kubeflow namespace, even if it just acts as gateway to AWS/GCP should handle it. you can configure it as gateway. If there is storage only accessible from the users namespace but is somehow considered Frankenstein Kubeflow storage, that breaks the architecture. The artifact-proxy has many CVEs and is very bad architecture in general. Relying on it would be massive technical debt and graduation blocker. We should have one Cluster administrator defined S3 storage for Kubeflow where stuff is stored. That can be seaweedfs directly or in gateway mode to GCP/AWS/AZURE/MINIO etc. but it it one place where also the pipeline run caching etc relies on. Where administrators set lifcyclepolicies etc. for users. You can still pull input data for your KF pipelines from anywhere else but the derivatives computed in KFP should either be stored and managed within Kubeflows S3 endpoint with the administrators policies or considered out of scope for Kubeflow.
Per namespace separate storage systems cannot be properly/reliably managed and accessed by Kubeflow. That would be end-user territory. Namespaces and external buckets can be deleted, secrets and configmaps in user namespaces can be modified etc. The artifact-proxy with its many CVEs allows a lot of exploits, cost resources, bloats up the service mesh and can just be stopped by the user. Also network connection to the external user-provided bucket somewhere in the internet is not guaranteed. That is nothing we can rely on, nothing we can enforce policies for, support caching or can guarantee security for. For such external stuff we can only show the URL to the bucket/artifact in the UI, everything else would be best effort and unreliable.

jholt96 · 2025-09-09T14:59:15Z

@juliusvonkohout Happy to help with the test, and the zero overhead namespaces sound awesome.
1. Does that mean the ml-pipeline-ui pods no longer initiate any egress requests to any pods in user namespaces? If it still does then this would be necessary to ensure authpolicies work properly.
No, they do not need access anymore and with 500 users/namespaces y on your Kubeflow cluster you can save terabytes of memory for istio sidecars and 1000 (500x2) pods if you use the master branch with seaweedfs
2. For those that have not upgraded and are even on the latest, I think this change is still good to commit to this repo as this issue will exist until they migrate to seaweedfs. I can move it to an overlay directory if that makes more sense to you.
IMO it is worth it to at least document as this issue is hard to diagnose if people are unaware of the sidecar object implementation and what that means for istio mtls.
We can make seaweedfs the default here as i did with one commit here.

That is a great change and I am excited for the CVE remediation alone lol. I still think that the sidecar configuration needs to be documented in the istio README as it essentially hides the fact that it breaks MTLS for any use case that is not covered in the egress rules. We had to modify ours not only for the ml-pipeline-ui-artifact but for some other workloads where the sidecar configuration was breaking MTLS rules in our authorization policies. I added a commit example for the documentation.

I can create a new PR that updates the test, and then added that documentation if that sounds good to you

juliusvonkohout · 2025-09-09T15:03:56Z

@juliusvonkohout Happy to help with the test, and the zero overhead namespaces sound awesome.
1. Does that mean the ml-pipeline-ui pods no longer initiate any egress requests to any pods in user namespaces? If it still does then this would be necessary to ensure authpolicies work properly.
No, they do not need access anymore and with 500 users/namespaces y on your Kubeflow cluster you can save terabytes of memory for istio sidecars and 1000 (500x2) pods if you use the master branch with seaweedfs
2. For those that have not upgraded and are even on the latest, I think this change is still good to commit to this repo as this issue will exist until they migrate to seaweedfs. I can move it to an overlay directory if that makes more sense to you.
IMO it is worth it to at least document as this issue is hard to diagnose if people are unaware of the sidecar object implementation and what that means for istio mtls.
We can make seaweedfs the default here as i did with one commit here.
That is a great change and I am excited for the CVE remediation alone lol. I still think that the sidecar configuration needs to be documented in the istio README as it essentially hides the fact that it breaks MTLS for any use case that is not covered in the egress rules. We had to modify ours not only for the ml-pipeline-ui-artifact but for some other workloads where the sidecar configuration was breaking MTLS rules in our authorization policies. I added a commit example for the documentation.

I can create a new PR that updates the test, and then added that documentation if that sounds good to you

Yes please you can get help from @akagami-harsh on slack for enabling seaweedfs by default right now as in this PR. I think we have to move one networkpolicy. @akagami-harsh you can also fork this one here and create a separate pr to use seaweedfs by default right now.

juliusvonkohout · 2025-09-09T15:04:15Z

/ok-to-test

droctothorpe · 2025-09-09T15:36:40Z

@juliusvonkohout, our end users have specific buckets tied to their namespaces that only they have access to. I think this is a fairly common model. Isolating end users to namespace-specific S3 buckets improves multi-tenant isolation and security and ensures that sensitive end-user data is not stored in a shared place. It also grants end users the ability to access and life-cycle manage their buckets directly. This is precisely why the artifact proxy was implemented. If the seaweed implementation breaks that functionality, I suspect we have a significant regression on our hands, and it's likely that other users will be impacted as well.

I understand the reluctance to use the FE (frontend) server for the proxy. I share it. What if we made the artifact proxy deployment in end user namespaces conditional so that users like you who don't need it can disable it? Medium-term, we can work on implementing stripped down proxy in Golang that can also be disabled.

Did the seaweedfs PR strip the artifact proxy handlers out of the node server?

CC @zazulam

juliusvonkohout · 2025-09-09T21:33:27Z

@juliusvonkohout, our end users have specific buckets tied to their namespaces that only they have access to. I think this is a fairly common model. Isolating end users to namespace-specific S3 buckets improves multi-tenant isolation and security and ensures that sensitive end-user data is not stored in a shared place. It also grants end users the ability to access and life-cycle manage their buckets directly. This is precisely why the artifact proxy was implemented. If the seaweed implementation breaks that functionality, I suspect we have a significant regression on our hands, and it's likely that other users will be impacted as well.

I understand the reluctance to use the FE (frontend) server for the proxy. I share it. What if we made the artifact proxy deployment in end user namespaces conditional so that users like you who don't need it can disable it? Medium-term, we can work on implementing stripped down proxy in Golang that can also be disabled.

Did the seaweedfs PR strip the artifact proxy handlers out of the node server?

CC @zazulam

Seaweedfs should also work with the artifact proxy, but we should definitely make it opt-in instead of opt-out to have security by default. You can reenable the not covered by tests artifact proxy if needed in your cluster right now by adjusting sync.py and the ml-pipeline-ui deployment. By default also the old deployments should continue to be there. "Isolating end users to namespace-specific S3 buckets improves multi-tenant isolation and security and ensures that sensitive end-user data is not stored in a shared place. " Is solved by seaweedfs now, but yes the lifecycle of artifacts created by KFP is then governed by the administrator. For me that makes more sense. It is fine if users have their own buckets, but then they should push long-term needed artifacts explicitly to per namespace buckets and not have them unreliably half-managed by Kubeflow. Also as a side note, all of this should only affect the UI, not actual pipeline runs. If this is not true anymore please point me to a piece of code where the artifact proxy is used for more than UI stuff. So as of now we are only talking about displaying artifacts from User-Controllers outside of Kubeflow buckets differently in the UI.

droctothorpe · 2025-09-10T02:44:40Z

You're correct in that it only impacts the UI.

Seaweedfs should also work with the artifact proxy

Can you elaborate on this pattern? What would this look like, practically speaking? Or do you just mean it would work alongside the artifact proxy? Meaning some UI endpoints can path through Seaweedfs, while others can path through the artifact proxy.

juliusvonkohout · 2025-09-10T10:20:34Z

You're correct in that it only impacts the UI.

Seaweedfs should also work with the artifact proxy

Can you elaborate on this pattern? What would this look like, practically speaking? Or do you just mean it would work alongside the artifact proxy? Meaning some UI endpoints can path through Seaweedfs, while others can path through the artifact proxy.

It is just the same as before with minio and the artifact proxy, so an ugly mess. It goes from the user to the UI to the users namespace, to the artifact proxy and then with the secrets in the users namespace to the bucket, so either back to the kubeflow namespace to the S3 storage (minio/seaweedfs) or a bucket outside of the kubeflow space. And then back to the UI in the kubeflow namespace. So that violating architecture plus the CVEs you do not want to have in an enterprise environment. Without the proxy it is just UI to S3
Just displaying the artifact URL in the UI (status quo) if it is an external and user controlled bucket sounds like the better and safer option to me. Also external buckets are not pure Kubernetes kubeflow that we can properly test in our CI/CD. That is customisation on the consuming companies end.

juliusvonkohout · 2025-09-10T14:21:27Z

Please rebase after #3240 is merged

common/istio/istio-install/base/sidecar-prune-egress.yaml

droctothorpe · 2025-09-10T19:57:10Z

I don't think it makes sense to use the artifact proxy alongside MinIO. The proxy is useful specifically when you're not relying on in-cluster storage. I think we can decouple MinIO support from artifact proxy support. And I think we can implement a stripped down proxy that doesn't rely on the frontend image.

droctothorpe · 2025-09-10T20:08:09Z

I don't understand how S3 requests are succeeding at all without an artifact proxy given this handler, which attempts to route requests through the artifact proxy. Have you had a chance to manually validate S3 (or GCS or ABS for that matter) access with the artifact proxy disabled?

…y and add documentation on the sidecar object configuration Signed-off-by: Josh Holt <jholt96@live.com>

Signed-off-by: Josh Holt <jholt96@live.com>

droctothorpe · 2025-09-11T19:34:57Z

Moving discussion to here: kubeflow/pipelines#12239

juliusvonkohout · 2025-09-12T10:31:47Z

I don't think it makes sense to use the artifact proxy alongside MinIO. The proxy is useful specifically when you're not relying on in-cluster storage. I think we can decouple MinIO support from artifact proxy support. And I think we can implement a stripped down proxy that doesn't rely on the frontend image.

I could imagine companies using both and heterogenous use cases per department.

Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>

juliusvonkohout · 2025-09-12T10:43:38Z

/ok-to-test

Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>

Signed-off-by: Josh Holt <jholt96@live.com>

juliusvonkohout · 2025-09-12T11:01:35Z

@jholt96 @akagami-harsh can you fix the tests? Did you rebase to the latest master?

jholt96 · 2025-09-12T11:20:12Z

@jholt96 @akagami-harsh can you fix the tests? Did you rebase to the latest master?

It has been rebased to latest master.

jholt96 · 2025-09-12T11:46:51Z

Ah there was an issue in camelCase should be good now @juliusvonkohout

jholt96 · 2025-09-12T12:30:00Z

The apiVersion change in the authorizationpolicy for kserve was also broken. Not sure why the dex job is failing but the kserve one has been fixed

juliusvonkohout · 2025-09-12T13:18:16Z

/lgtm
/approve

google-oss-prow · 2025-09-12T13:18:23Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: droctothorpe, juliusvonkohout

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [juliusvonkohout]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

juliusvonkohout · 2025-09-12T13:27:10Z

Thank you for the PR.

google-oss-prow bot added the size/S label Sep 6, 2025

google-oss-prow bot requested review from juliusvonkohout and kimwnasptd September 6, 2025 20:42

jholt96 force-pushed the master branch 4 times, most recently from f91b089 to 581e81c Compare September 6, 2025 20:54

google-oss-prow bot assigned droctothorpe Sep 8, 2025

google-oss-prow bot added the lgtm label Sep 8, 2025

jholt96 force-pushed the master branch from 581e81c to 3d82d22 Compare September 8, 2025 16:04

google-oss-prow bot removed the lgtm label Sep 8, 2025

google-oss-prow bot added the ok-to-test label Sep 9, 2025

juliusvonkohout reviewed Sep 10, 2025

View reviewed changes

common/istio/istio-install/base/sidecar-prune-egress.yaml Show resolved Hide resolved

fix mtls rbac for artifact retrieval for ml-pipeline-ui artifact prox…

5945a22

…y and add documentation on the sidecar object configuration Signed-off-by: Josh Holt <jholt96@live.com>

fix apiVersion in base cluster-local-gateway

0471684

Signed-off-by: Josh Holt <jholt96@live.com>

jholt96 force-pushed the master branch 2 times, most recently from a375a11 to 5945a22 Compare September 11, 2025 17:34

droctothorpe mentioned this pull request Sep 11, 2025

[backend/frontend] regression introduced by artifact proxy deprecation kubeflow/pipelines#12239

Open

juliusvonkohout added 2 commits September 12, 2025 11:40

Update README.md

8d68a6b

Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>

Update sidecar-prune-egress.yaml

4692585

Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>

Update sidecar-prune-egress.yaml

0908551

Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>

google-oss-prow bot added size/M and removed size/S labels Sep 12, 2025

juliusvonkohout and others added 2 commits September 12, 2025 11:56

Update cluster-local-gateway-authorizationpolicy-patch.yaml

28575b5

Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>

fix apiVersion

b40d66b

Signed-off-by: Josh Holt <jholt96@live.com>

google-oss-prow bot assigned juliusvonkohout Sep 12, 2025

google-oss-prow bot added the lgtm label Sep 12, 2025

google-oss-prow bot added the approved label Sep 12, 2025

google-oss-prow bot merged commit 2368901 into kubeflow:master Sep 12, 2025
30 of 31 checks passed

juliusvonkohout mentioned this pull request Sep 29, 2025

chore: Add artifact proxy deployment to sync.py kubeflow/pipelines#12285

Merged

fix-ml-pipeline-ui-mtls #3236

fix-ml-pipeline-ui-mtls #3236

Conversation

jholt96 commented Sep 6, 2025

Pull Request Template for Kubeflow Manifests

✏️ Summary of Changes

📦 Dependencies

🐛 Related Issues

✅ Contributor Checklist

Uh oh!

droctothorpe commented Sep 8, 2025

Uh oh!

juliusvonkohout commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

droctothorpe commented Sep 9, 2025

Uh oh!

jholt96 commented Sep 9, 2025

Uh oh!

juliusvonkohout commented Sep 9, 2025

Uh oh!

juliusvonkohout commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jholt96 commented Sep 9, 2025

Uh oh!

juliusvonkohout commented Sep 9, 2025

Uh oh!

juliusvonkohout commented Sep 9, 2025

Uh oh!

droctothorpe commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

juliusvonkohout commented Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

droctothorpe commented Sep 10, 2025

Uh oh!

juliusvonkohout commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

juliusvonkohout commented Sep 10, 2025

Uh oh!

Uh oh!

droctothorpe commented Sep 10, 2025

Uh oh!

droctothorpe commented Sep 10, 2025

Uh oh!

droctothorpe commented Sep 11, 2025

Uh oh!

juliusvonkohout commented Sep 12, 2025

Uh oh!

juliusvonkohout commented Sep 12, 2025

Uh oh!

juliusvonkohout commented Sep 12, 2025

Uh oh!

jholt96 commented Sep 12, 2025

Uh oh!

jholt96 commented Sep 12, 2025

Uh oh!

jholt96 commented Sep 12, 2025

Uh oh!

juliusvonkohout commented Sep 12, 2025

Uh oh!

google-oss-prow bot commented Sep 12, 2025

Uh oh!

juliusvonkohout commented Sep 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

juliusvonkohout commented Sep 9, 2025 •

edited

Loading

juliusvonkohout commented Sep 9, 2025 •

edited

Loading

droctothorpe commented Sep 9, 2025 •

edited

Loading

juliusvonkohout commented Sep 9, 2025 •

edited

Loading

juliusvonkohout commented Sep 10, 2025 •

edited

Loading