-
Notifications
You must be signed in to change notification settings - Fork 0
Update pauser to handle exception and run unpause as long as possible #33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
92ebb3a
07d8325
08312b2
34a03fc
25c10f6
4d22a34
42f1a30
1d17af9
f087077
d3b54c0
55ba29c
891beb8
c233242
744da27
c413386
600e7b1
9e9132d
909e596
f9dc445
5cca44c
639e132
b93c054
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,6 @@ | ||
package com.scalar.admin.kubernetes; | ||
|
||
import com.google.common.annotations.VisibleForTesting; | ||
import com.google.common.util.concurrent.Uninterruptibles; | ||
import com.scalar.admin.RequestCoordinator; | ||
import io.kubernetes.client.openapi.Configuration; | ||
|
@@ -34,7 +35,7 @@ | |
@NotThreadSafe | ||
public class Pauser { | ||
|
||
private static final int MAX_UNPAUSE_RETRY_COUNT = 3; | ||
@VisibleForTesting static final int MAX_UNPAUSE_RETRY_COUNT = 3; | ||
|
||
private final Logger logger = LoggerFactory.getLogger(Pauser.class); | ||
private final TargetSelector targetSelector; | ||
|
@@ -79,26 +80,50 @@ public PausedDuration pause(int pauseDuration, @Nullable Long maxPauseWaitTime) | |
|
||
TargetSnapshot target; | ||
try { | ||
target = targetSelector.select(); | ||
target = getTarget(); | ||
} catch (Exception e) { | ||
throw new PauserException("Failed to find the target pods to pause.", e); | ||
} | ||
|
||
RequestCoordinator coordinator = getRequestCoordinator(target); | ||
RequestCoordinator coordinator; | ||
try { | ||
coordinator = getRequestCoordinator(target); | ||
} catch (Exception e) { | ||
throw new PauserException("Failed to initialize the coordinator.", e); | ||
} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added exception handling for the initialization process of |
||
|
||
coordinator.pause(true, maxPauseWaitTime); | ||
Instant startTime; | ||
Instant endTime; | ||
try { | ||
coordinator.pause(true, maxPauseWaitTime); | ||
|
||
Instant startTime = Instant.now(); | ||
startTime = Instant.now(); | ||
|
||
Uninterruptibles.sleepUninterruptibly(pauseDuration, TimeUnit.MILLISECONDS); | ||
Uninterruptibles.sleepUninterruptibly(pauseDuration, TimeUnit.MILLISECONDS); | ||
|
||
Instant endTime = Instant.now(); | ||
endTime = Instant.now(); | ||
|
||
unpauseWithRetry(coordinator, MAX_UNPAUSE_RETRY_COUNT, target); | ||
unpauseWithRetry(coordinator, MAX_UNPAUSE_RETRY_COUNT, target); | ||
|
||
} catch (Exception e) { | ||
kota2and3kan marked this conversation as resolved.
Show resolved
Hide resolved
|
||
try { | ||
unpauseWithRetry(coordinator, MAX_UNPAUSE_RETRY_COUNT, target); | ||
} catch (PauserException ex) { | ||
throw new PauserException("unpauseWithRetry() method failed twice.", e); | ||
} catch (Exception ex) { | ||
throw new PauserException( | ||
"unpauseWithRetry() method failed twice due to unexpected exception.", e); | ||
} | ||
throw new PauserException( | ||
"The pause operation failed for some reason. However, the unpause operation succeeded" | ||
+ " afterward. Currently, the scalar products are running with the unpause status." | ||
+ " You should retry the pause operation to ensure proper backup.", | ||
e); | ||
} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is the main update of this PR. Previously, there was no exception handling. So, if some error occurred, this library didn't try to run To address such issue, I added the exception handling here. In this implementation, if some errors occur, this library tries to run the |
||
|
||
TargetSnapshot targetAfterPause; | ||
try { | ||
targetAfterPause = targetSelector.select(); | ||
targetAfterPause = getTarget(); | ||
} catch (Exception e) { | ||
throw new PauserException( | ||
"Failed to find the target pods to examine if the targets pods were updated during" | ||
|
@@ -113,8 +138,9 @@ public PausedDuration pause(int pauseDuration, @Nullable Long maxPauseWaitTime) | |
return new PausedDuration(startTime, endTime); | ||
} | ||
|
||
private void unpauseWithRetry( | ||
RequestCoordinator coordinator, int maxRetryCount, TargetSnapshot target) { | ||
@VisibleForTesting | ||
void unpauseWithRetry(RequestCoordinator coordinator, int maxRetryCount, TargetSnapshot target) | ||
throws PauserException { | ||
int retryCounter = 0; | ||
|
||
while (true) { | ||
|
@@ -123,17 +149,33 @@ private void unpauseWithRetry( | |
return; | ||
} catch (Exception e) { | ||
if (++retryCounter >= maxRetryCount) { | ||
logger.warn( | ||
// If someone uses this library directly instead of using our CLI, users should handle the | ||
// exception properly. However, this case is a very critical issue. Therefore, we output | ||
// the error message here despite whether the exception is handled or not on the caller | ||
// side. | ||
logger.error( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The error messages are same? If so, how about sharing it like this? if (++retryCounter >= maxRetryCount) {
String errorMessage = String.format(
"Failed to unpause Scalar product. They are still in paused. You must restart"
+ " related pods by using the `kubectl rollout restart deployment %s` command"
+ " to unpause all pods.",
target.getDeployment().getMetadata().getName());
logger.error(errorMessage, e);
throw new PauserException(errorMessage, e);
} There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, passing the exception to |
||
"Failed to unpause Scalar product. They are still in paused. You must restart related" | ||
+ " pods by using the `kubectl rollout restart deployment {}`" | ||
+ " command to unpase all pods.", | ||
+ " command to unpause all pods.", | ||
target.getDeployment().getMetadata().getName()); | ||
return; | ||
// In our CLI, we catch this exception and output the message as an error on the CLI side. | ||
throw new PauserException( | ||
String.format( | ||
"Failed to unpause Scalar product. They are still in paused. You must restart" | ||
+ " related pods by using the `kubectl rollout restart deployment %s` command" | ||
+ " to unpause all pods.", | ||
target.getDeployment().getMetadata().getName()), | ||
e); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If I will update the message after I implement the After I implement such a new feature, users can use it to unpause Scalar products instead of using the |
||
} | ||
} | ||
} | ||
} | ||
|
||
@VisibleForTesting | ||
TargetSnapshot getTarget() throws PauserException { | ||
return targetSelector.select(); | ||
} | ||
Comment on lines
+202
to
+205
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As I mentioned in another comment, I separated this |
||
|
||
RequestCoordinator getRequestCoordinator(TargetSnapshot target) { | ||
return new RequestCoordinator( | ||
target.getPods().stream() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I separated this
targetSelector.select()
as one methodgetTarget()
for testing. This targetSelector is initialized in the constructor ofPauser
. So, controlling the behavior oftargetSelector
by using Mockito is a bit complex and difficult. That's why I separate this function.