Skip to content

Helm Chart Templates For Katib #2553

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

kunal-511
Copy link

What this PR does / why we need it:
Helm Templates For Katib

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #

Checklist:

  • Docs included if any changes are user facing

Signed-off-by: kunal-511 <yoyokvunal@gmail.com>
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign johnugeorge for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kunal-511
Copy link
Author

cc: @juliusvonkohout

Signed-off-by: kunal-511 <yoyokvunal@gmail.com>
@kunal-511 kunal-511 marked this pull request as ready for review June 19, 2025 15:46
Copy link

@kunal-511: Cannot trigger testing until a trusted user reviews the PR and leaves an /ok-to-test message.

In response to this:

/retest

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@juliusvonkohout
Copy link
Member

/ok-to-test

Signed-off-by: kunal-511 <yoyokvunal@gmail.com>
@kunal-511
Copy link
Author

/retest

@varodrig
Copy link

varodrig commented Jul 1, 2025

@juliusvonkohout should this PR be linked to kubeflow/community#832 ?

cc @kunal-511

@varodrig
Copy link

varodrig commented Jul 1, 2025

@kunal-511 let us know once this PR is ready for review. It seems there's a hold + two checks failing.

Additionally, can you add some description and comments on the PR, we have 58 files to review and it'll be great to have more comments on it to help us with the approval.

Another thing I think it will be great if we can involve the Katib working group on this review.
thoughts @juliusvonkohout
cc @andreyvelich

{{- end }}
ports:
- name: postgres
containerPort: 5432
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my opinion is that ports should be configurable unless it's hard-coded somewhere in katib

- "/bin/bash"
- "-c"
- "mysqladmin ping -u root -p${MYSQL_ROOT_PASSWORD}"
periodSeconds: 15
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this configurations should be customizable it could depend on the cluster that it's running and resources available.

initialDelaySeconds - periodSeconds - failureThreshold

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Electronic-Waste to add thoughts

args:
- --katib-config=/katib-config.yaml
ports:
- containerPort: 8443
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in my opinion ports should be added in the values files unless it's hardcoded somewhere in katib.

kunal-511 added 2 commits July 1, 2025 18:02
Signed-off-by: kunal-511 <yoyokvunal@gmail.com>
Signed-off-by: kunal-511 <yoyokvunal@gmail.com>
@juliusvonkohout
Copy link
Member

/ok-to-test

kunal-511 added 2 commits July 6, 2025 10:06
Signed-off-by: kunal-511 <yoyokvunal@gmail.com>
Signed-off-by: kunal-511 <yoyokvunal@gmail.com>
@kunal-511
Copy link
Author

kunal-511 commented Jul 7, 2025

The tree structure of the Kubeflow Katib Helm chart directory with explanations for each file and folder.

katib(helm charts)/
├── Chart.yaml                                 # Helm chart metadata and version information
├── README.md                                  # Chart documentation and usage instructions
├── values.yaml                                # Default configuration values for the chart
├── ci/                                        # CI/CD configuration files for testing
│   ├── values-cert-manager.yaml               # Test values for cert-manager integration
│   ├── values-enterprise.yaml                 # Test values for enterprise deployment
│   ├── values-external-db.yaml                # Test values for external database configuration
│   ├── values-kubeflow.yaml                   # Test values for Kubeflow integration
│   ├── values-leader-election.yaml            # Test values for leader election setup
│   ├── values-openshift.yaml                  # Test values for OpenShift platform
│   ├── values-postgres.yaml                   # Test values for PostgreSQL database
│   ├── values-production.yaml                 # Test values for production deployment
│   └── values-standalone.yaml                 # Test values for standalone deployment
├── crds/                                      # Custom Resource Definitions for Katib
│   ├── experiment.yaml                        # CRD for hyperparameter tuning experiments
│   ├── suggestion.yaml                        # CRD for algorithm suggestions
│   └── trial.yaml                             # CRD for individual experiment trials
├── templates/                                 # Helm template files for Kubernetes resources
│   ├── _helpers.tpl                           # Template helper functions and common definitions
│   ├── autoscaling/                           # Horizontal Pod Autoscaler configurations
│   │   └── hpa.yaml                           # HPA template for scaling components
│   ├── config/                                # Configuration files and ConfigMaps
│   │   └── configmap.yaml                     # Main configuration settings for Katib
│   ├── controller/                            # Katib controller deployment resources
│   │   ├── deployment.yaml                    # Controller deployment specification
│   │   ├── leader-election-rbac.yaml          # RBAC for leader election functionality
│   │   ├── rbac.yaml                          # Role-based access control for controller
│   │   ├── service.yaml                       # Service for controller communication
│   │   ├── serviceaccount.yaml                # ServiceAccount for controller pod
│   │   └── trial-templates-configmap.yaml     # Trial template configurations
│   ├── database/                              # Database deployment resources
│   │   ├── mysql-deployment.yaml              # MySQL database deployment
│   │   ├── mysql-pvc.yaml                     # MySQL persistent volume claim
│   │   ├── mysql-service.yaml                 # MySQL service for database access
│   │   ├── postgres-deployment.yaml           # PostgreSQL database deployment
│   │   ├── postgres-pvc.yaml                  # PostgreSQL persistent volume claim
│   │   ├── postgres-service.yaml              # PostgreSQL service for database access
│   │   └── secret.yaml                        # Database credentials and secrets
│   ├── db-manager/                            # Database manager component
│   │   ├── deployment.yaml                    # DB manager deployment specification
│   │   └── service.yaml                       # Service for DB manager communication
│   ├── db/                                    # Additional database configurations
│   ├── istio/                                 # Istio service mesh integration
│   │   ├── authorization-policy.yaml          # Istio authorization policies
│   │   └── virtual-service.yaml               # Istio virtual service configuration
│   ├── monitoring/                            # Monitoring and observability resources
│   │   └── servicemonitor.yaml                # Prometheus ServiceMonitor for metrics
│   ├── namespace/                             # Namespace creation resources
│   │   └── namespace.yaml                     # Namespace definition for Katib
│   ├── openshift/                             # OpenShift-specific configurations
│   ├── rbac/                                  # Additional RBAC configurations
│   │   └── kubeflow-roles.yaml                # Kubeflow-specific role definitions
│   ├── security/                              # Security-related configurations
│   │   ├── network-policy.yaml                # Network policies for pod communication
│   │   └── pod-disruption-budget.yaml         # Pod disruption budget for availability
│   ├── ui/                                    # Katib UI component resources
│   │   ├── deployment.yaml                    # UI deployment specification
│   │   ├── rbac.yaml                          # RBAC for UI component
│   │   ├── service.yaml                       # Service for UI access
│   │   └── serviceaccount.yaml                # ServiceAccount for UI pod
│   └── webhook/                               # Admission webhook configurations
│       ├── certificate.yaml                   # TLS certificates for webhook
│       ├── mutating-webhook.yaml              # Mutating admission webhook
│       └── validating-webhook.yaml            # Validating admission webhook

@juliusvonkohout @varodrig

Copy link

@varodrig varodrig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kunal-511 nice job! I've reviewed the standalone install including namespace and controller folder.
I'll continue with the rest during this week. If you can take a look at this comments it'll be great. thank you
cc @juliusvonkohout

labels:
{{- include "katib.labels" . | nindent 4 }}
katib.kubeflow.org/metrics-collector-injection: enabled
{{- if .Values.podSecurityStandards.enforced }}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this code, pod security Standards on katib namespace.yaml file. is this coming from the implementation from manifests?
Ensure we are validating any changes to the current katib manifests with the katib team.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is Optional security hardening in helm charts which can be easily disable

Copy link

@varodrig varodrig Jul 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kunal-511 great security practice. I'd like the katib team to be aware of this updates
cc @Electronic-Waste to approve.

- https://github.com/kubeflow/katib

annotations:
category: Machine Learning

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validate this category with the katib team

@@ -0,0 +1,123 @@
{{- if .Values.controller.enabled }}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd keep the same file name as it's on the katib manifests, controller.yaml

{{- end }}
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "{{ .Values.controller.ports.metrics }}"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find this value on the values-standale.yaml file.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Values.yaml is the main file

{{- end }}
containers:
- name: katib-controller
image: {{ include "katib.image" (dict "registry" .Values.global.imageRegistry "repository" .Values.controller.image.repository "tag" (.Values.controller.image.tag | default .Values.global.imageTag) "global" .Values.global) }}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we define a default value on the values file?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already present in the values.yaml file

- containerPort: {{ .Values.controller.ports.healthz }}
name: {{ .Values.controller.service.ports.healthz.name }}
protocol: TCP
readinessProbe:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd keep the probes here. I do not know of a reason of why this would change unless the implementation changes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Electronic-Waste to provide thoughts

{{- toYaml . | nindent 12 }}
{{- end }}
{{- with .Values.controller.securityContext }}
securityContext:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line duplicated and content is also in helpers. I recommend to keep security best practices on the deployment file.

args:
{{- toYaml .Values.controller.command.args | nindent 12 }}
ports:
- containerPort: {{ .Values.controller.ports.webhook }}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find the ports on the values standalone file.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Values.yaml is the main file

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I couldn't find it before. I see now. thanks.

spec:
{{- include "katib.serviceType" (dict "type" .Values.controller.service.type "Values" .Values) | nindent 2 }}
ports:
- port: {{ .Values.controller.service.ports.webhook.port }}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is in the values.yaml

@@ -0,0 +1,37 @@
{{- if .Values.controller.leaderElection.enabled }}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems this is a new file. can you validate with the katib team? or let me know if I'm missing something.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Helm chart integrated this as a configurable option rather than requiring a separate installation profile. In the original manifests, you'd need to choose between:
katib-standalone (no leader election)
katib-leader-election (with leader election)
The Helm chart makes this toggleable through controller.leaderElection.enabled, providing more flexibility in a single chart.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the clarification. I'd refer to the katib team on this.
@Electronic-Waste

spec:
{{- include "katib.serviceType" (dict "type" .Values.controller.service.type "Values" .Values) | nindent 2 }}
ports:
- port: {{ .Values.controller.service.ports.webhook.port }}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is in the values.yaml

@juliusvonkohout
Copy link
Member

@anencore94

@Electronic-Waste

can you help here ?

@juliusvonkohout
Copy link
Member

/ok-to-test

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like a new implementation. thanks!
@Electronic-Waste to review.

{{- end }}
resources:
requests:
storage: {{ .Values.database.postgres.persistence.size | quote }}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

values file shows 10Gi, original file on manifests is 3Gi

Copy link
Author

@kunal-511 kunal-511 Jul 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the ci values override all the main file values. So
In charts/katib/ci/values-postgres.yaml (lines 35-37):

persistence:
  enabled: true
  size: 3Gi  # ← This matches the manifests!

@varodrig
Copy link

/ok-to-test

@varodrig
Copy link

great job @kunal-511 thank so much for working on this. There's a lot of work and thought process on understanding the current katib kustomize files, installation and helm .

I appreciate the amazing work you have done.

We need @Electronic-Waste to do a final review. @Electronic-Waste I left a few comments for you, please review ci in detail and there are good practices implemented from security constraints, PDB, network policies and service monitoring ensure you are ok with this implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants