-
Notifications
You must be signed in to change notification settings - Fork 495
[FLINK-37515] Basic support for Blue/Green deployments #969
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
...tor-api/src/main/java/org/apache/flink/kubernetes/operator/api/bluegreen/DeploymentType.java
Outdated
Show resolved
Hide resolved
...main/java/org/apache/flink/kubernetes/operator/api/status/FlinkBlueGreenDeploymentState.java
Show resolved
Hide resolved
...java/org/apache/flink/kubernetes/operator/controller/FlinkBlueGreenDeploymentController.java
Outdated
Show resolved
Hide resolved
...java/org/apache/flink/kubernetes/operator/controller/FlinkBlueGreenDeploymentController.java
Outdated
Show resolved
Hide resolved
I am concerned about the number of TODOs. Things like dealing with errors and timing conditions should be in place in the initial drop of code - unless we can mark this capability as beta or the like. WDYT @gyfora ? |
@davidradl thanks for your review/comments. I've cleaned up all the outdated/invalid TODOs which pointed me 1 missing unit test, added. At this point the few remaining TODOs do not interfere with the main functionality and are not critical/open for discussion. |
👍 |
when this feature gonna be released? |
Hi @hansh0801, trying to release it ASAP but I also want to make sure the logic and contracts are as robust as possible to minimize changes later. Looks pretty stable in my current testing and I've reopened it for review. Thanks! |
cool ! and I'm also waiting for blue green phase 2 :) |
@hansh0801 indeed... Phase 2 will take a little bit longer due to its complexity but already working on it. |
...tor-api/src/main/java/org/apache/flink/kubernetes/operator/api/bluegreen/DeploymentType.java
Outdated
Show resolved
Hide resolved
...tor-api/src/main/java/org/apache/flink/kubernetes/operator/api/bluegreen/TransitionMode.java
Outdated
Show resolved
Hide resolved
...src/main/java/org/apache/flink/kubernetes/operator/api/spec/FlinkDeploymentTemplateSpec.java
Outdated
Show resolved
Hide resolved
...ain/java/org/apache/flink/kubernetes/operator/api/status/FlinkBlueGreenDeploymentStatus.java
Outdated
Show resolved
Hide resolved
...ain/java/org/apache/flink/kubernetes/operator/api/status/FlinkBlueGreenDeploymentStatus.java
Outdated
Show resolved
Hide resolved
...java/org/apache/flink/kubernetes/operator/controller/FlinkBlueGreenDeploymentController.java
Outdated
Show resolved
Hide resolved
...java/org/apache/flink/kubernetes/operator/controller/FlinkBlueGreenDeploymentController.java
Outdated
Show resolved
Hide resolved
...java/org/apache/flink/kubernetes/operator/controller/FlinkBlueGreenDeploymentController.java
Outdated
Show resolved
Hide resolved
...java/org/apache/flink/kubernetes/operator/controller/FlinkBlueGreenDeploymentController.java
Outdated
Show resolved
Hide resolved
...java/org/apache/flink/kubernetes/operator/controller/FlinkBlueGreenDeploymentController.java
Outdated
Show resolved
Hide resolved
...java/org/apache/flink/kubernetes/operator/controller/FlinkBlueGreenDeploymentController.java
Outdated
Show resolved
Hide resolved
...java/org/apache/flink/kubernetes/operator/controller/FlinkBlueGreenDeploymentController.java
Outdated
Show resolved
Hide resolved
...java/org/apache/flink/kubernetes/operator/controller/FlinkBlueGreenDeploymentController.java
Outdated
Show resolved
Hide resolved
...java/org/apache/flink/kubernetes/operator/controller/FlinkBlueGreenDeploymentController.java
Outdated
Show resolved
Hide resolved
...java/org/apache/flink/kubernetes/operator/controller/FlinkBlueGreenDeploymentController.java
Outdated
Show resolved
Hide resolved
...main/java/org/apache/flink/kubernetes/operator/api/status/FlinkBlueGreenDeploymentState.java
Show resolved
Hide resolved
...tes-operator-api/src/main/java/org/apache/flink/kubernetes/operator/api/utils/SpecUtils.java
Outdated
Show resolved
Hide resolved
...tes-operator-api/src/main/java/org/apache/flink/kubernetes/operator/api/utils/SpecUtils.java
Show resolved
Hide resolved
Hi @schongloo @davidradl @gyfora , I hope you're doing well. I've been following this PR as blue-green deployments would solve a critical use case for our production needs. I noticed there have been no updates to this PR for a few weeks. Was wondering if there is anything blocking the release of this and if there is something myself or the OSS community could help with (testing, fixing any changes, etc). Thanks for all the work so far! 🙏 |
@maheepm-lyft , @schongloo is currently on holiday but once he is back we can try to merge this as soon as reasonably possible :) In the meantime you can help us by trying out the current version and by providing a code review for the PR, that can definitely speed things up. |
@schongloo can you please re-open this PR against the |
Great thanks! Taking us a bit of time to test this out in our stack, but we're working on it and I'll post any comments here. |
...tor-api/src/main/java/org/apache/flink/kubernetes/operator/api/bluegreen/DeploymentType.java
Outdated
Show resolved
Hide resolved
...ava/org/apache/flink/kubernetes/operator/api/spec/FlinkBlueGreenDeploymentConfigOptions.java
Outdated
Show resolved
Hide resolved
...ava/org/apache/flink/kubernetes/operator/api/spec/FlinkBlueGreenDeploymentConfigOptions.java
Outdated
Show resolved
Hide resolved
...ava/org/apache/flink/kubernetes/operator/api/spec/FlinkBlueGreenDeploymentConfigOptions.java
Outdated
Show resolved
Hide resolved
.durationType() | ||
.defaultValue(Duration.ofSeconds(15)) | ||
.withDescription( | ||
"Configurable delay in milliseconds to use when the operator reschedules a reconciliation."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest including some advice as to what to set this to in different circumstances. Same for deployment-deletion.delay
...main/java/org/apache/flink/kubernetes/operator/api/status/FlinkBlueGreenDeploymentState.java
Show resolved
Hide resolved
* | ||
* <p>Deployment States | ||
* | ||
* <p>1. INITIALIZING_BLUE - First-time deployment setup 2. ACTIVE_BLUE - Blue environment serving |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit maybe use a html list for this list
...java/org/apache/flink/kubernetes/operator/controller/FlinkBlueGreenDeploymentController.java
Outdated
Show resolved
Hide resolved
...java/org/apache/flink/kubernetes/operator/controller/FlinkBlueGreenDeploymentController.java
Outdated
Show resolved
Hide resolved
...src/main/java/org/apache/flink/kubernetes/operator/controller/FlinkBlueGreenDeployments.java
Show resolved
Hide resolved
@davidradl thank you for all your feedback. The past few weeks I was focused on mostly the high level refactoring, I'll address your comments ASAP. |
...ava/org/apache/flink/kubernetes/operator/api/spec/FlinkBlueGreenDeploymentConfigOptions.java
Show resolved
Hide resolved
bb9d576
to
7ed0e6f
Compare
...src/main/java/org/apache/flink/kubernetes/operator/api/spec/FlinkDeploymentTemplateSpec.java
Outdated
Show resolved
Hide resolved
38cee0e
to
0398adc
Compare
String wrapperKey, | ||
Class<T> valueType) { | ||
String serializedSpec = SpecUtils.writeSpecAsJSON(spec, wrapperKey); | ||
String replacedSerializedSpec = serializedSpec.replace(deploymentName, childDeploymentName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will do replacement of the image tag as well. We would probably need to exclude that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to what vikas said. This will change the name in places that are problematic. In my testing, I am facing problems so far in expected configmap name, savepoint path, and metrics reporters.
...va/org/apache/flink/kubernetes/operator/controller/bluegreen/BlueGreenDeploymentService.java
Show resolved
Hide resolved
88394be
to
68e5ecc
Compare
68e5ecc
to
a57579c
Compare
private UpdateControl<FlinkBlueGreenDeployment> patchFlinkDeployment( | ||
BlueGreenContext context, BlueGreenDeploymentType blueGreenDeploymentTypeToPatch) { | ||
|
||
String childDeploymentName = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @schongloo , sorry to leave a comment after the PR is merged. I have been doing some testing with BlueGreen deployments, and I think this is replacing the name in several places where it should not be. For example, if I am using an initialSavepointPath
with savepoint path in S3, it is replacing the app name in the path with app name + "-blue" or "-green", and the app is unable to find the corresponding savepoint. E.g. app_name
becomes app_name-blue
. If the savepoint path already has -blue in it (e.g. app_name-blue
, then it is updating it to app_name-blue-blue
. I am seeing similar issues related to a few other things like configmap names, although I haven't followed that code path all the way.
EDIT: added a comment elsewhere after looking into the bug a little more.
What is the purpose of the change
This pull request adds basic support for Blue/Green deployments as outlined by FLIP-503 (https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=337677648).
Brief change log
FlinkBlueGreenDeploymentController
controller, with the capability of managing the lifecycle of these deployments.FlinkBlueGreenDeploymentController
will manage this CRD and hide from the user the details of the actual Blue/Green (Active/StandBy) jobs.FlinkDeployment
controller.Verifying this change
Does this pull request potentially affect one of the following parts:
CustomResourceDescriptors
: noDocumentation
docs/content/docs
Overview Diagram