Skip to content

Support simple dumping of Argo manifests #2338

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
milesgranger opened this issue Mar 6, 2025 · 2 comments
Open

Support simple dumping of Argo manifests #2338

milesgranger opened this issue Mar 6, 2025 · 2 comments

Comments

@milesgranger
Copy link

milesgranger commented Mar 6, 2025

Support a --dump-manifests flag or subcommand that only constructs the argo-workflows manifest(s).

We use ArgoCD extensively for basically all our resources. I'm not very keen on using the existing deployment of argo-workflows create to deploy. I'd rather dump the manifests out into a kustomization app that ArgoCD then evaluates for actual deployment.

I know about the --only-json but it seems clear it's trying to access and modify cluster things:

❯ METAFLOW_DATASTORE_SYSROOT_S3=s3://s3.my-on-prem-cluster.intra.example.com/argo/workflows python test_metaflow.py --datastore=s3 argo-workflows create --only-json --namespace argo
Metaflow 2.15.4 executing ParameterFlow for user:milesg
Validating your flow...
    The graph looks good!
Running pylint...
    Pylint not found, so extra checks are disabled.
Deploying parameterflow to Argo Workflows...
It seems this is the first time you are deploying parameterflow to Argo Workflows.

A new production token generated.

The namespace of this production flow is
    production:parameterflow-0-gtqo
To analyze results of this production flow add this line in your notebooks:
    namespace("production:parameterflow-0-gtqo")
If you want to authorize other people to deploy new versions of this flow to Argo Workflows, they need to call
    argo-workflows create --authorize parameterflow-0-gtqo
when deploying this flow to Argo Workflows for the first time.
See "Organizing Results" at https://docs.metaflow.org/ for more information about production tokens.

    S3 access denied:
    s3://[s3.my-on-prem-cluster.intra.example.com](s3://s3.my-on-prem-cluster.intra.example.com/argo/workflows/ParameterFlow/data/ff/fffd58f887cd06e42fe5b841acba70b0781344dd

I could go down the road of providing proper access, but the point is I don't want it doing any of this; metaflow will eventually exist on our cluster thru Helm installation (happy to provide a PR for a helm chart pre-configured for on-prem installs later if this works out), and I only want the argo workflow manifest(s) which I will then feed to ArgoCD like all our other resources/services.

I'm happy to dedicate some time to implement this if it's not already and I just missed it.
I would be willing to contribute this feature with guidance from the community.

@upczsh
Copy link

upczsh commented Mar 24, 2025

I need this feature!!

@milesgranger
Copy link
Author

milesgranger commented Mar 31, 2025

Re-opening this after having spent some time using Metaflow. I'm quite convinced now that there ought to be some way to integrate the state of my code base to what should be on the cluster.

Right now, if one adds, then removes a schedule or trigger_on_finish or even deletes a flow, those resources will become orphaned on the cluster from a prior deployment. It makes sense of course as there is nothing in Metaflow tying code state and the cluster resources together.

The only way I can see to do this from the design of Metaflow is to keep some ability of the code state and allow something like ArgoCD to reconcile required updates.

I see maybe even the --only-json is going away? (#2339). Turns out I'm fine w/ --only-json uploading code to S3 as this is how Metaflow is designed to work w/ Argo Workflows, but I'm not fine with there being no ability to represent state changes represented by the code. Which can be done with --only-json.

To be clear, I think it's great one can run ad-hoc / development flows in a development cluster that gets cleaned up regularly, but in production we need a more declarative approach.

@milesgranger milesgranger reopened this Mar 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants