-
Notifications
You must be signed in to change notification settings - Fork 151
K8SPSMDB-1211: handle FULL CLUSTER CRASH
error during the restore
#1926
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
pkg/controller/common/common.go
Outdated
@@ -0,0 +1,70 @@ | |||
package common |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think packages named common, utils, etc., tend to be vague, as they imply shared logic without a clearly defined domain or separation of concerns.
In this file, the main struct is CommonReconciler
, but it's not clear what exactly is being reconciled. The struct also mixes responsibilities: as it's constructing and returning heterogeneous components like backup.PBM, mongo.Client, a scheme, and a k8s client.
To improve clarity and maintainability, I'd suggest:
-
Keeping the scheme and the Kubernetes client in
ReconcilePerconaServerMongoDB
, and having related function with receivers of typeReconcilePerconaServerMongoDB
. -
Splitting out PBM-related logic into a dedicated PBM factory/service.
-
Doing the same for the MongoClientProvider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commit: 2614d85 |
https://perconadev.atlassian.net/browse/K8SPSMDB-1211
DESCRIPTION
Problem:
During the physical restore, the operator detects a
FULL CLUSTER CRASH
and attempts to resolve the issue. The operator log contains theFULL CLUSTER CRASH
log message, which should not be logged because this error occurs 100% of the time during the physical restore.Solution:
The solution is to perform the same action the
(*ReconcilePerconaServerMongoDB) handleReplicaSetNoPrimary
method does after the physical restore. Once PBM has finished the restore, the operator should recreate the statefulsets and add thepercona.com/restore-in-progress
annotation to them and handle theFULL CLUSTER CRASH
state. Afterwards, thepercona.com/restore-in-progress
annotation should be removed from the statefulsets.CHECKLIST
Jira
Needs Doc
) and QA (Needs QA
)?Tests
compare/*-oc.yml
)?Config/Logging/Testability