-
Notifications
You must be signed in to change notification settings - Fork 1k
fix-ml-pipeline-ui-mtls #3236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix-ml-pipeline-ui-mtls #3236
Conversation
f91b089 to
581e81c
Compare
|
Thanks for this contribution, @jholt96! /approve |
|
@jholt96 ml-pipeline-ui-artifact has been removed and you now have proper zero overhead namespaces. It is already merged in Kubeflow/pipelines and soon the default also in Kubeflow/manifests. https://medium.com/@hpotpose26/kubeflow-pipelines-embraces-seaweedfs-9a7e022d5571. See
|
|
@juliusvonkohout in the seaweed implementation, how does the FE access artifacts in external object stores that only end user namespaces have access to? |
|
@juliusvonkohout Happy to help with the test, and the zero overhead namespaces sound awesome.
IMO it is worth it to at least document as this issue is hard to diagnose if people are unaware of the sidecar object implementation and what that means for istio mtls. |
No, they do not need access anymore and with 500 users/namespaces y on your Kubeflow cluster you can save terabytes of memory for istio sidecars and 1000 (500x2) pods if you use the master branch with seaweedfs
We can make seaweedfs the default here as i did with one commit here. |
I do not know what you mean by FE, but i will try to answer the general question. The s3 storage, in the Kubeflow namespace, even if it just acts as gateway to AWS/GCP should handle it. you can configure it as gateway. If there is storage only accessible from the users namespace but is somehow considered Frankenstein Kubeflow storage, that breaks the architecture. The artifact-proxy has many CVEs and is very bad architecture in general. Relying on it would be massive technical debt and graduation blocker. We should have one Cluster administrator defined S3 storage for Kubeflow where stuff is stored. That can be seaweedfs directly or in gateway mode to GCP/AWS/AZURE/MINIO etc. but it it one place where also the pipeline run caching etc relies on. Where administrators set lifcyclepolicies etc. for users. You can still pull input data for your KF pipelines from anywhere else but the derivatives computed in KFP should either be stored and managed within Kubeflows S3 endpoint with the administrators policies or considered out of scope for Kubeflow. |
That is a great change and I am excited for the CVE remediation alone lol. I still think that the sidecar configuration needs to be documented in the istio README as it essentially hides the fact that it breaks MTLS for any use case that is not covered in the egress rules. We had to modify ours not only for the ml-pipeline-ui-artifact but for some other workloads where the sidecar configuration was breaking MTLS rules in our authorization policies. I added a commit example for the documentation. I can create a new PR that updates the test, and then added that documentation if that sounds good to you |
Yes please you can get help from @akagami-harsh on slack for enabling seaweedfs by default right now as in this PR. I think we have to move one networkpolicy. @akagami-harsh you can also fork this one here and create a separate pr to use seaweedfs by default right now. |
|
/ok-to-test |
|
@juliusvonkohout, our end users have specific buckets tied to their namespaces that only they have access to. I think this is a fairly common model. Isolating end users to namespace-specific S3 buckets improves multi-tenant isolation and security and ensures that sensitive end-user data is not stored in a shared place. It also grants end users the ability to access and life-cycle manage their buckets directly. This is precisely why the artifact proxy was implemented. If the seaweed implementation breaks that functionality, I suspect we have a significant regression on our hands, and it's likely that other users will be impacted as well. I understand the reluctance to use the FE (frontend) server for the proxy. I share it. What if we made the artifact proxy deployment in end user namespaces conditional so that users like you who don't need it can disable it? Medium-term, we can work on implementing stripped down proxy in Golang that can also be disabled. Did the seaweedfs PR strip the artifact proxy handlers out of the node server? CC @zazulam |
Seaweedfs should also work with the artifact proxy, but we should definitely make it opt-in instead of opt-out to have security by default. You can reenable the not covered by tests artifact proxy if needed in your cluster right now by adjusting sync.py and the ml-pipeline-ui deployment. By default also the old deployments should continue to be there. "Isolating end users to namespace-specific S3 buckets improves multi-tenant isolation and security and ensures that sensitive end-user data is not stored in a shared place. " Is solved by seaweedfs now, but yes the lifecycle of artifacts created by KFP is then governed by the administrator. For me that makes more sense. It is fine if users have their own buckets, but then they should push long-term needed artifacts explicitly to per namespace buckets and not have them unreliably half-managed by Kubeflow. Also as a side note, all of this should only affect the UI, not actual pipeline runs. If this is not true anymore please point me to a piece of code where the artifact proxy is used for more than UI stuff. So as of now we are only talking about displaying artifacts from User-Controllers outside of Kubeflow buckets differently in the UI. |
|
You're correct in that it only impacts the UI.
Can you elaborate on this pattern? What would this look like, practically speaking? Or do you just mean it would work alongside the artifact proxy? Meaning some UI endpoints can path through Seaweedfs, while others can path through the artifact proxy. |
It is just the same as before with minio and the artifact proxy, so an ugly mess. It goes from the user to the UI to the users namespace, to the artifact proxy and then with the secrets in the users namespace to the bucket, so either back to the kubeflow namespace to the S3 storage (minio/seaweedfs) or a bucket outside of the kubeflow space. And then back to the UI in the kubeflow namespace. So that violating architecture plus the CVEs you do not want to have in an enterprise environment. Without the proxy it is just UI to S3 |
|
Please rebase after #3240 is merged |
|
I don't think it makes sense to use the artifact proxy alongside MinIO. The proxy is useful specifically when you're not relying on in-cluster storage. I think we can decouple MinIO support from artifact proxy support. And I think we can implement a stripped down proxy that doesn't rely on the frontend image. |
|
I don't understand how S3 requests are succeeding at all without an artifact proxy given this handler, which attempts to route requests through the artifact proxy. Have you had a chance to manually validate S3 (or GCS or ABS for that matter) access with the artifact proxy disabled? |
…y and add documentation on the sidecar object configuration Signed-off-by: Josh Holt <jholt96@live.com>
Signed-off-by: Josh Holt <jholt96@live.com>
a375a11 to
5945a22
Compare
|
Moving discussion to here: kubeflow/pipelines#12239 |
I could imagine companies using both and heterogenous use cases per department. |
Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>
Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>
|
/ok-to-test |
Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>
Signed-off-by: Julius von Kohout <45896133+juliusvonkohout@users.noreply.github.com>
Signed-off-by: Josh Holt <jholt96@live.com>
|
@jholt96 @akagami-harsh can you fix the tests? Did you rebase to the latest master? |
It has been rebased to latest master. |
|
Ah there was an issue in camelCase should be good now @juliusvonkohout |
|
The apiVersion change in the authorizationpolicy for kserve was also broken. Not sure why the dex job is failing but the kserve one has been fixed |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: droctothorpe, juliusvonkohout The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Thank you for the PR. |
Pull Request Template for Kubeflow Manifests
✏️ Summary of Changes
This change fixes artifact retrieval in Kubeflow pipelines when the default sidecar is enabled. With the first default sidecar configuration, it blocks the ml-pipeline-ui pod from being able to proxy artifact requests from users that go to the ml-pipeline-ui-artifact pods in profile created namespaces. This is because in kubeflow 1.9+, the authorizationpolicy created in the namespace limits requests based on the principle and the SA of the ml-pipeline-ui pod. Without this change, users get “RBAC: Access Denied”
📦 Dependencies
None
🐛 Related Issues
✅ Contributor Checklist