Skip to content

Conversation

sohankunkerkar
Copy link
Member

Add Resources field to KueueConfiguration to expose upstream Kueue's resource transformation feature. This allows users to configure resource exclusions and transformations (Replace/Retain strategies) via the operator CR.

Copy link

openshift-ci bot commented Aug 14, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: sohankunkerkar

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 14, 2025
Copy link
Contributor

@kannon92 kannon92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/hold

The main reason why we went through API review was to not put a dependency on the kueue API in our API.

It would be worth going through an api-review for this new API.

cc @JoelSpeed @everettraven

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 14, 2025
// Resource transformations allow converting PodSpec resources into Workload resource requests.
// This field is optional.
// +optional
Resources *configapi.Resources `json:"resources,omitempty"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We recommend not to import types from other APIs. You cannot control the features added to the API and implicitly support anything that the upstream project adds without any form of internal review or testing.

I would suggest bringing the types into the project here directly from the upstream and then we can review them to make sure they conform to our API standards, and that we are happy supporting them in the long term.

Bear in mind that the upstream types are v1beta1 and so could also be subject to change over time, where our APIs are v1 and cannot change in the same ways

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this makes sense to me. I have created an upstream issue here: kubernetes-sigs/kueue#6582 to understand the future of that feature.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So generally we are interested in getting Kueue APIs graduated to V1. We still should not include these APIs in our own API. I think regardless of V1 promotion, we should still bring the types into this project and define them in accordance to Openshift API standards.

Kueue has a long roadmap planned for v1 API graduation (starting with sometime this year or next, they will promote the APIs to v1beta2). And then eventually they will promote to V1 but the dates for that have not yet been determined.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

openshift/api#2222 (comment)

We did have some discussion about this API in the early stages of our API review.

Copy link
Member Author

@sohankunkerkar sohankunkerkar Aug 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, should I propose the change in the openshift enhancement first?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the process I went through was to propose the full API in openshift/api.

The linter helps get most of the API standards and then we do the review there.

And then we copy those api changes back into this repo.

This is a bit painful and I'd prefer to maybe just have the API review done on the operator repo if possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I avoided having the API in the enhacement as it was being discussed in the API review anyway.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is probably worth setting up the API linting for this repo as well

Add Resources field to KueueConfiguration to expose upstream Kueue's
resource transformation feature. This allows users to configure resource
exclusions and transformations (Replace/Retain strategies) via the
operator CR.

Signed-off-by: Sohan Kunkerkar <sohank2602@gmail.com>
@sohankunkerkar sohankunkerkar force-pushed the configure-resource-upstream branch from 2d81c9c to e2d8858 Compare August 14, 2025 20:22
// or other resources that should not be managed by Kueue.
// Each prefix should be a resource name prefix (e.g., "example.com/", "ephemeral-storage").
// +kubebuilder:validation:MaxItems=100
// +kubebuilder:validation:XValidation:rule="self.all(prefix, size(prefix) <= 253 && size(prefix) >= 1)",message="each excludedResourcePrefix must be between 1 and 253 characters"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think this CEL is correct. What is prefix?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In CEL, the second argument to .all() is a predicate, and the first argument is just a loop variable name for each element in the list. Here, prefix is simply the loop variable name for each string in ExcludedResourcePrefixes. If the name is causing confusion, I can change it to something more generic to make it clear it’s just a local variable.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are likely better to create a type alias for the string and set the min/max length deliberately on the type alias, then make this list a list of the type alias rather than plain string

@kannon92
Copy link
Contributor

Can you add this to main actually?

@sohankunkerkar sohankunkerkar changed the base branch from release-1.1 to main August 15, 2025 12:08
Copy link

openshift-ci bot commented Aug 15, 2025

@sohankunkerkar: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/test-e2e-4-17 e2d8858 link false /test test-e2e-4-17
ci/prow/lint e2d8858 link true /test lint

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@sohankunkerkar
Copy link
Member Author

@JoelSpeed could you PTAL?

// +kubebuilder:validation:XValidation:rule="self.all(prefix, size(prefix) <= 253 && size(prefix) >= 1)",message="each excludedResourcePrefix must be between 1 and 253 characters"
// +listType=set
ExcludedResourcePrefixes []string `json:"excludedResourcePrefixes"`
// transformations defines a list of resource transformations that can be applied
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be unique?

Strategy *ResourceTransformationStrategy `json:"strategy,omitempty"`
// outputs defines the resources that the input resource should be transformed into.
// Each output specifies a resource name and quantity that should be generated per unit of input.
// +kubebuilder:validation:MaxItems=20
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure 20 is a valid and future proof for future MIG?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://docs.nvidia.com/datacenter/tesla/mig-user-guide/#b200-mig-profiles

I'm not entirely sure what values someone should put for this based on A100, H100 or B200.

// or other resources that should not be managed by Kueue.
// Each prefix should be a resource name prefix (e.g., "example.com/", "ephemeral-storage").
// +kubebuilder:validation:MaxItems=100
// +kubebuilder:validation:XValidation:rule="self.all(prefix, size(prefix) <= 253 && size(prefix) >= 1)",message="each excludedResourcePrefix must be between 1 and 253 characters"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are likely better to create a type alias for the string and set the min/max length deliberately on the type alias, then make this list a list of the type alias rather than plain string

// +kubebuilder:validation:MaxItems=100
// +kubebuilder:validation:XValidation:rule="self.all(prefix, size(prefix) <= 253 && size(prefix) >= 1)",message="each excludedResourcePrefix must be between 1 and 253 characters"
// +listType=set
ExcludedResourcePrefixes []string `json:"excludedResourcePrefixes"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this optional or required?

// +kubebuilder:validation:MaxItems=100
// +listType=map
// +listMapKey=input
Transformations []ResourceTransformation `json:"transformations"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional or required?

// Resource transformations allow converting PodSpec resources into Workload resource requests.
// This field is optional.
// +optional
Resources Resources `json:"resources"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API guidance has recently changed and we are generally avoiding the "discoverability" concept now. This would mean we should add omitempty to all fields. Not sure if we want to think about/talk through making that change now that this API has shipped

// +kubebuilder:validation:Enum=Retain;Replace
// +kubebuilder:default=Retain
// +optional
Strategy *ResourceTransformationStrategy `json:"strategy,omitempty"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a pointer? "" isn't a valid value right?

// Must be a valid Kubernetes resource quantity (e.g., "1", "100m", "1Gi").
// +kubebuilder:validation:Pattern=`^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$`
// +required
Quantity string `json:"quantity"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What version of OCP will this change run on? CEL has the option to validate resource quantity style fields for you, but only really working from 4.20 onwards

// +kubebuilder:validation:MaxLength=253
// +kubebuilder:validation:MinLength=1
// +required
Resource string `json:"resource"`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from length, what other validation can we add here (e.g. valid characters as a regex)

@kannon92
Copy link
Contributor

@JoelSpeed
Thank you for your review so far.

We met yesterday about this API and we realize that we may want to avoid exposing this API atm.

The configuration of this is complicated and we realize that the only reason why Kueue admins would need to configure this is for batch jobs. For serving workloads, a pure DAS webhook would actually serve most of our targeted users for DAS.

So I think we may table this API while we wait for more clear use cases on batch + DAS integration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants