Skip to content

Conversation

0xavi0
Copy link
Contributor

@0xavi0 0xavi0 commented Oct 1, 2025

This PR adds the ability to control whether a BundleDeployment can or cannot be installed, based on defined Schedules.

A new CRD, Schedule, is introduced. It defines time intervals during which Clusters—selected via label selectors—are allowed to accept new deployments.

An example Schedule would look like this:

apiVersion: fleet.cattle.io/v1alpha1
kind: Schedule
metadata:
  name: schedule1
  namespace: fleet-default
spec:
  schedule: "0 */5 * * * *"
  duration: 1m
  targets:
    clusters:
      - name: local
        clusterSelector:
          matchLabels:
            env: dev

This would allow new deployments to be installed every 5 minutes (at second 0) for a duration of 1 minute, and it would apply to all clusters located in the fleet-default namespace that have the label env=dev.

When a Schedule is applied, the controller evaluates which Clusters match the defined selectors. For all matching Clusters, the Status.Scheduled field is set to true.

This field indicates that the Cluster is governed by a Schedule, meaning no new BundleDeployments will be applied to it unless the Status.ActiveSchedule field — also newly introduced — is explicitly set to true.

The Status.ActiveSchedule field is set to true at the moment a Schedule begins execution and remains true for the duration defined in the Schedule’s specification.

The mechanism used to prevent an Agent from applying a BundleDeployment while a Schedule is active mirrors the existing implementation of the "Pause" feature.

The controller actively monitors changes to any Cluster that is part of an existing Schedule. If a Cluster's labels or name are modified, the controller re-evaluates and updates the affected Schedule’s target set accordingly.

Additionally, when a Schedule begins execution, the controller performs a fresh evaluation to detect newly matching Clusters or exclude any that no longer meet the criteria.

The set of Clusters associated with a Schedule is computed either at the time of Schedule creation, upon modification of the Schedule, or at the start of its execution.


📅 Schedule Execution Implementation

The execution logic for the Schedule has been implemented as follows:

  • The next execution time is calculated based on the current date and time.
  • The duration until that time is computed, and a quartz.NewRunOnceTrigger is created.
  • When execution occurs, another quartz.NewRunOnceTrigger is created for the defined duration, and the Started property is set to true.
  • Upon the next execution, the start time of the following execution is recalculated, and the Started property is set back to false.

quartz.NewCronTriggerWithLoc was intentionally not used

The reason for avoiding quartz.NewCronTriggerWithLoc is to prevent jobs from remaining in the scheduler indefinitely and to allow more precise control over job execution — both start time and duration.

The goal is to minimize potential race conditions, especially in edge cases where a job is scheduled to begin very close to the end of a previous execution.

A constraint has also been introduced to ensure that the minimum viable execution duration is at least 1 second, after computing feasibility relative to the execution start time.

That is, there must be at least 1 second between consecutive execution start times.
This further helps reduce race conditions, since starting an execution involves multiple API calls to the Kubernetes API server.

🧪 This feature is currently experimental.

To enable it, set the following environment variable:

EXPERIMENTAL_SCHEDULES=true

🔧 Pending Work

  • Overlap detection for Schedules has not yet been implemented.
  • Targets have been implemented using a clusters field, anticipating future support for other resources like GitRepos, where polling or Bundle updates could be paused.
  • Currently, only Cron-based Schedules are supported as execution triggers (with second-level granularity).
  • Emit events when a new BundleDeployment is ready but did not deploy because the Cluster is scheduled and not active?

Refers to: #3726

Additional Information

Checklist

  • I have updated the documentation via a pull request in the
    fleet-docs repository.

@0xavi0 0xavi0 requested a review from a team as a code owner October 1, 2025 08:02
@0xavi0 0xavi0 force-pushed the 3726-schedules branch 5 times, most recently from 5c444fe to 474182e Compare October 1, 2025 09:24
Refers to: rancher#3726

Signed-off-by: Xavi Garcia <xavi.garcia@suse.com>
@0xavi0 0xavi0 changed the title Scheduled BundleDeployments initial version Scheduled BundleDeployments experimental version Oct 1, 2025
@0xavi0 0xavi0 self-assigned this Oct 1, 2025
@0xavi0 0xavi0 added this to Fleet Oct 1, 2025
@0xavi0 0xavi0 added this to the v2.13.0 milestone Oct 1, 2025
@0xavi0 0xavi0 moved this to 👀 In review in Fleet Oct 1, 2025
Copy link
Contributor

@weyfonk weyfonk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this \o/
Leaving a few comments, happy to discuss!

Signed-off-by: Xavi Garcia <xavi.garcia@suse.com>
@0xavi0 0xavi0 requested a review from weyfonk October 3, 2025 08:18
// changes to cluster labels that occurred since the last reconciliation
// are included. The controller's watchers only trigger reconciles for
// clusters that are already part of a schedule.
clusters, err := matchingClusters(ctx, c.Matcher, c.client, c.Schedule.Namespace)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's discuss, as I think I understand the change you made, but not how it relates to my initial comment 😅

Signed-off-by: Xavi Garcia <xavi.garcia@suse.com>
Copy link
Contributor

@weyfonk weyfonk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :)

@0xavi0 0xavi0 merged commit ce0b480 into rancher:main Oct 6, 2025
33 of 36 checks passed
@github-project-automation github-project-automation bot moved this from 👀 In review to ✅ Done in Fleet Oct 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants