-
Notifications
You must be signed in to change notification settings - Fork 19
pkg/configmap: add upstream resource transformations support #528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
pkg/configmap: add upstream resource transformations support #528
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: sohankunkerkar The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/hold
The main reason why we went through API review was to not put a dependency on the kueue API in our API.
It would be worth going through an api-review for this new API.
pkg/apis/kueueoperator/v1/types.go
Outdated
// Resource transformations allow converting PodSpec resources into Workload resource requests. | ||
// This field is optional. | ||
// +optional | ||
Resources *configapi.Resources `json:"resources,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We recommend not to import types from other APIs. You cannot control the features added to the API and implicitly support anything that the upstream project adds without any form of internal review or testing.
I would suggest bringing the types into the project here directly from the upstream and then we can review them to make sure they conform to our API standards, and that we are happy supporting them in the long term.
Bear in mind that the upstream types are v1beta1 and so could also be subject to change over time, where our APIs are v1 and cannot change in the same ways
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this makes sense to me. I have created an upstream issue here: kubernetes-sigs/kueue#6582 to understand the future of that feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So generally we are interested in getting Kueue APIs graduated to V1. We still should not include these APIs in our own API. I think regardless of V1 promotion, we should still bring the types into this project and define them in accordance to Openshift API standards.
Kueue has a long roadmap planned for v1 API graduation (starting with sometime this year or next, they will promote the APIs to v1beta2). And then eventually they will promote to V1 but the dates for that have not yet been determined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We did have some discussion about this API in the early stages of our API review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, should I propose the change in the openshift enhancement first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the process I went through was to propose the full API in openshift/api.
The linter helps get most of the API standards and then we do the review there.
And then we copy those api changes back into this repo.
This is a bit painful and I'd prefer to maybe just have the API review done on the operator repo if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I avoided having the API in the enhacement as it was being discussed in the API review anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is probably worth setting up the API linting for this repo as well
Add Resources field to KueueConfiguration to expose upstream Kueue's resource transformation feature. This allows users to configure resource exclusions and transformations (Replace/Retain strategies) via the operator CR. Signed-off-by: Sohan Kunkerkar <sohank2602@gmail.com>
2d81c9c
to
e2d8858
Compare
// or other resources that should not be managed by Kueue. | ||
// Each prefix should be a resource name prefix (e.g., "example.com/", "ephemeral-storage"). | ||
// +kubebuilder:validation:MaxItems=100 | ||
// +kubebuilder:validation:XValidation:rule="self.all(prefix, size(prefix) <= 253 && size(prefix) >= 1)",message="each excludedResourcePrefix must be between 1 and 253 characters" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t think this CEL is correct. What is prefix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In CEL, the second argument to .all()
is a predicate, and the first argument is just a loop variable name for each element in the list. Here, prefix
is simply the loop variable name for each string in ExcludedResourcePrefixes
. If the name is causing confusion, I can change it to something more generic to make it clear it’s just a local variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are likely better to create a type alias for the string and set the min/max length deliberately on the type alias, then make this list a list of the type alias rather than plain string
Can you add this to main actually? |
@sohankunkerkar: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
@JoelSpeed could you PTAL? |
// +kubebuilder:validation:XValidation:rule="self.all(prefix, size(prefix) <= 253 && size(prefix) >= 1)",message="each excludedResourcePrefix must be between 1 and 253 characters" | ||
// +listType=set | ||
ExcludedResourcePrefixes []string `json:"excludedResourcePrefixes"` | ||
// transformations defines a list of resource transformations that can be applied |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this be unique?
Strategy *ResourceTransformationStrategy `json:"strategy,omitempty"` | ||
// outputs defines the resources that the input resource should be transformed into. | ||
// Each output specifies a resource name and quantity that should be generated per unit of input. | ||
// +kubebuilder:validation:MaxItems=20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure 20 is a valid and future proof for future MIG?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://docs.nvidia.com/datacenter/tesla/mig-user-guide/#b200-mig-profiles
I'm not entirely sure what values someone should put for this based on A100, H100 or B200.
// or other resources that should not be managed by Kueue. | ||
// Each prefix should be a resource name prefix (e.g., "example.com/", "ephemeral-storage"). | ||
// +kubebuilder:validation:MaxItems=100 | ||
// +kubebuilder:validation:XValidation:rule="self.all(prefix, size(prefix) <= 253 && size(prefix) >= 1)",message="each excludedResourcePrefix must be between 1 and 253 characters" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are likely better to create a type alias for the string and set the min/max length deliberately on the type alias, then make this list a list of the type alias rather than plain string
// +kubebuilder:validation:MaxItems=100 | ||
// +kubebuilder:validation:XValidation:rule="self.all(prefix, size(prefix) <= 253 && size(prefix) >= 1)",message="each excludedResourcePrefix must be between 1 and 253 characters" | ||
// +listType=set | ||
ExcludedResourcePrefixes []string `json:"excludedResourcePrefixes"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this optional or required?
// +kubebuilder:validation:MaxItems=100 | ||
// +listType=map | ||
// +listMapKey=input | ||
Transformations []ResourceTransformation `json:"transformations"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional or required?
// Resource transformations allow converting PodSpec resources into Workload resource requests. | ||
// This field is optional. | ||
// +optional | ||
Resources Resources `json:"resources"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
API guidance has recently changed and we are generally avoiding the "discoverability" concept now. This would mean we should add omitempty to all fields. Not sure if we want to think about/talk through making that change now that this API has shipped
// +kubebuilder:validation:Enum=Retain;Replace | ||
// +kubebuilder:default=Retain | ||
// +optional | ||
Strategy *ResourceTransformationStrategy `json:"strategy,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why a pointer? ""
isn't a valid value right?
// Must be a valid Kubernetes resource quantity (e.g., "1", "100m", "1Gi"). | ||
// +kubebuilder:validation:Pattern=`^([+-]?[0-9.]+)([eEinumkKMGTP]*[-+]?[0-9]*)$` | ||
// +required | ||
Quantity string `json:"quantity"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What version of OCP will this change run on? CEL has the option to validate resource quantity style fields for you, but only really working from 4.20 onwards
// +kubebuilder:validation:MaxLength=253 | ||
// +kubebuilder:validation:MinLength=1 | ||
// +required | ||
Resource string `json:"resource"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from length, what other validation can we add here (e.g. valid characters as a regex)
@JoelSpeed We met yesterday about this API and we realize that we may want to avoid exposing this API atm. The configuration of this is complicated and we realize that the only reason why Kueue admins would need to configure this is for batch jobs. For serving workloads, a pure DAS webhook would actually serve most of our targeted users for DAS. So I think we may table this API while we wait for more clear use cases on batch + DAS integration. |
Add Resources field to KueueConfiguration to expose upstream Kueue's resource transformation feature. This allows users to configure resource exclusions and transformations (Replace/Retain strategies) via the operator CR.