Skip to content

Conversation

thiyyakat
Copy link
Member

@thiyyakat thiyyakat commented Sep 22, 2025

What this PR does / why we need it:

This PR introduces a proposal to support the temporary preservation of machines.

The use cases have been discussed here.

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

Release note:

Added proposal for the temporary preservation of machines.

@thiyyakat thiyyakat requested a review from a team as a code owner September 22, 2025 09:51
@gardener-robot-ci-2 gardener-robot-ci-2 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Sep 22, 2025
@gardener-robot gardener-robot added the needs/review Needs review label Sep 22, 2025
@thiyyakat thiyyakat force-pushed the proposal/failed-machine-preserve branch from 20d8f8f to 9fbdb30 Compare September 22, 2025 09:52
@gardener-robot gardener-robot added the size/m Size of pull request is medium (see gardener-robot robot/bots/size.py) label Sep 22, 2025
Copy link

@etiennnr etiennnr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to reach out to me directly for precision!

Copy link
Contributor

@ashwani2k ashwani2k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the proposal. I've put some comments, while reading it I felt we have not considered all the personas who can consume this feature.

  1. Operators who will manipulate the machine/node objects.
  2. Stakholders who will maniuplate the shoot spec.
  3. Stakeholder who will manipulate the node objects.

Another dimension which I don't see mentioned is how is this state conveyed as part of shoot status and the dashboard.

@thiyyakat thiyyakat marked this pull request as draft October 3, 2025 07:48
@thiyyakat thiyyakat changed the title Add proposal for temporary preservation of Failed machines for diagnostics [WIP] Add proposal for temporary preservation of Failed machines for diagnostics Oct 3, 2025
@thiyyakat thiyyakat changed the title [WIP] Add proposal for temporary preservation of Failed machines for diagnostics [WIP] Add proposal for temporary preservation of machines Oct 3, 2025
@thiyyakat thiyyakat force-pushed the proposal/failed-machine-preserve branch from 0b621a2 to 849a99d Compare October 8, 2025 14:29
@thiyyakat thiyyakat marked this pull request as ready for review October 8, 2025 14:31
@thiyyakat thiyyakat changed the title [WIP] Add proposal for temporary preservation of machines Add proposal for temporary preservation of machines Oct 9, 2025
- `machine.CurrentStatus.PreserveExpiryTime` is updated by MCM as $machine.CurrentStatus.PreserveExpiryTime = currentTime+machinePreserveTimeout$
- After timeout, the phase is changed to `Terminating`.
- Number of machines in `Failed:Preserved` phase count towards enforcing `autoPreserveFailedMax`.
7. If a failed machine is currently in `Failed:Preserved` and before timeout its VM/node is found to be Healthy, the machine will be moved to `Running`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this not be moved to Running:Preserved?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs/changes Needs (more) changes needs/review Needs review reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) size/m Size of pull request is medium (see gardener-robot robot/bots/size.py)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants