-
Notifications
You must be signed in to change notification settings - Fork 0
Update pauser to handle exception and run unpause as long as possible #33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update pauser to handle exception and run unpause as long as possible #33
Conversation
RequestCoordinator coordinator; | ||
try { | ||
coordinator = getRequestCoordinator(target); | ||
} catch (Exception e) { | ||
throw new PauserException("Failed to initialize the coordinator.", e); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added exception handling for the initialization process of RequestCoodinator
.
} catch (Exception e) { | ||
unpauseWithRetry(coordinator, MAX_UNPAUSE_RETRY_COUNT, target); | ||
throw e; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the main update of this PR.
Previously, there was no exception handling. So, there is a possibility that the Scalar products pod will keep paused
status forever. In such a case, the Scalar product does not accept any request. It causes the service down.
To avoid such a case as long as possible, we want to run unpause()
if any errors occur in any parts of the process.
target = targetSelector.select(); | ||
target = getTarget(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I separated this targetSelector.select()
as one method getTarget()
for testing. This targetSelector is initialized in the constructor of Pauser
. So, controlling the behavior of targetSelector
by using Mockito is a bit complex and difficult. That's why I separate this function.
try { | ||
coordinator.pause(true, maxPauseWaitTime); | ||
|
||
Instant startTime = Instant.now(); | ||
startTime = Instant.now(); | ||
|
||
Uninterruptibles.sleepUninterruptibly(pauseDuration, TimeUnit.MILLISECONDS); | ||
Uninterruptibles.sleepUninterruptibly(pauseDuration, TimeUnit.MILLISECONDS); | ||
|
||
Instant endTime = Instant.now(); | ||
endTime = Instant.now(); | ||
|
||
unpauseWithRetry(coordinator, MAX_UNPAUSE_RETRY_COUNT, target); | ||
unpauseWithRetry(coordinator, MAX_UNPAUSE_RETRY_COUNT, target); | ||
|
||
} catch (Exception e) { | ||
try { | ||
unpauseWithRetry(coordinator, MAX_UNPAUSE_RETRY_COUNT, target); | ||
} catch (PauserException ex) { | ||
throw new PauserException("unpauseWithRetry() method failed twice.", e); | ||
} catch (Exception ex) { | ||
throw new PauserException( | ||
"unpauseWithRetry() method failed twice due to unexpected exception.", e); | ||
} | ||
throw new PauserException( | ||
"The pause operation failed for some reason. However, the unpause operation succeeded" | ||
+ " afterward. Currently, the scalar products are running with the unpause status." | ||
+ " You should retry the pause operation to ensure proper backup.", | ||
e); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the main update of this PR.
Previously, there was no exception handling. So, if some error occurred, this library didn't try to run unpause()
. This might cause the Scalar products to keep the pause state. In other words, it might cause like system down.
To address such issue, I added the exception handling here. In this implementation, if some errors occur, this library tries to run the unpause()
operation.
// In our CLI, we catch this exception and output the message as an error on the CLI side. | ||
throw new PauserException( | ||
String.format( | ||
"Failed to unpause Scalar product. They are still in paused. You must restart" | ||
+ " related pods by using the `kubectl rollout restart deployment %s` command" | ||
+ " to unpause all pods.", | ||
target.getDeployment().getMetadata().getName()), | ||
e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If unpause()
failed three times, throw the PauserException
. The caller of the pause()
method should handle this exception.
I will update the message after I implement the pauser.unpause()
method. In other words, I will add a new option (or subcommand) that allows users to run the unpause operation solely, in another PR.
After I implement such a new feature, users can use it to unpause Scalar products instead of using the kubectl rollout restart deployment
command.
@VisibleForTesting | ||
TargetSnapshot getTarget() throws PauserException { | ||
return targetSelector.select(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned in another comment, I separated this targetSelector.select()
as one method getTarget()
for testing.
…en-pause-operation-failed
// exception handling. However, this scenario represents a critical issue. Consequently, | ||
// we output the error message here regardless of whether the calling code handles the | ||
// exception. | ||
logger.error( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error messages are same? If so, how about sharing it like this?
if (++retryCounter >= maxRetryCount) {
String errorMessage = String.format(
"Failed to unpause Scalar product. They are still in paused. You must restart"
+ " related pods by using the `kubectl rollout restart deployment %s` command"
+ " to unpause all pods.",
target.getDeployment().getMetadata().getName());
logger.error(errorMessage, e);
throw new PauserException(errorMessage, e);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, passing the exception to logger.error()
might be helpful for investigation?
private TargetSelector targetSelector; | ||
private V1Deployment deployment; | ||
private V1ObjectMeta objectMeta; | ||
private final String dummyObjectName = "dummyObjectName"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[minor] This can be a static constant like private static final String DUMMY_OBJECT_NAME
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your suggestion!
Fixed in f087077.
@komamitsu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments. Please take a look when you have time!
throw new PauserException( | ||
"Failed to find the target pods to examine if the targets pods were updated during" | ||
+ " paused.", | ||
e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If pausing or unpausing fails, shouldn't we display a message about it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. You're right. We need to return the exception with messages here if the pause or unpause fails (especially, the unpause failure is critical).
I will fix it. Thank you for pointing it out!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in d3b54c0.
try { | ||
compareTargetSuccessful = compareTargetStatus(targetBeforePause, targetAfterPause); | ||
} catch (Exception e) { | ||
statusCheckFailedException = new StatusCheckFailedException(statusCheckErrorMessage, e); | ||
if (unpauseFailedException == null) { | ||
throw statusCheckFailedException; | ||
} else { | ||
throw new UnpauseFailedException(unpauseErrorMessage, statusCheckFailedException); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like compareTargetStatus()
doesn't throw any exceptions. Do we still need this exception handling?
Maybe we need the exception handling for targetAfterPause
here?
https://github.com/scalar-labs/scalar-admin-for-kubernetes/pull/33/files#diff-529e2e818d31d239079b031928623cab070acd5743e317b3d709c34e6e52b394R114-R121
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like
compareTargetStatus()
doesn't throw any exceptions. Do we still need this exception handling?
For safety, I think it would be better to handle exceptions here. There is a possibility that some unexpected exceptions might occur here, and it could cause the issue keeping unpause state
.
Maybe we need the exception handling for
targetAfterPause
here? https://github.com/scalar-labs/scalar-admin-for-kubernetes/pull/33/files#diff-529e2e818d31d239079b031928623cab070acd5743e317b3d709c34e6e52b394R114-R121
As I mentioned in another comment, you're right. I will add the exception handling there. I overlooked it. Thank you for pointing it out!
However, as mentioned above, I think we can keep this exception handling here. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR improves error handling in the Pauser class to ensure a fallback mechanism is in place if pausing fails. Key changes include:
- Adding custom exceptions (UnpauseFailedException, StatusCheckFailedException, PauseFailedException) to provide detailed error messages.
- Enhancing the pause method in Pauser.java with nested try-catch blocks and a new unpauseWithRetry mechanism that now throws exceptions if unpause fails.
- Marking several internal methods with @VisibleForTesting to improve unit testability.
Reviewed Changes
Copilot reviewed 5 out of 6 changed files in this pull request and generated no comments.
File | Description |
---|---|
lib/src/main/java/com/scalar/admin/kubernetes/UnpauseFailedException.java | Introduces a new exception for unpause failures. |
lib/src/main/java/com/scalar/admin/kubernetes/StatusCheckFailedException.java | Introduces a new exception for status check failures. |
lib/src/main/java/com/scalar/admin/kubernetes/Pauser.java | Refactors error handling in pause method, replaces warning log with thrown exceptions in unpauseWithRetry, and improves testability. |
lib/src/main/java/com/scalar/admin/kubernetes/PauseFailedException.java | Introduces a new exception for pause failures. |
Files not reviewed (1)
- lib/build.gradle: Language not supported
Comments suppressed due to low confidence (3)
lib/src/main/java/com/scalar/admin/kubernetes/Pauser.java:123
- [nitpick] Consider revising the error message for clarity and grammar, for example: "Failed to find the target pods to verify if they were updated during the pause."
String getTargetAfterPauseErrorMessage = "Failed to find the target pods to examine if the targets pods were updated during paused. ";
lib/src/main/java/com/scalar/admin/kubernetes/Pauser.java:212
- [nitpick] Consider wrapping the thrown exception in unpauseWithRetry with additional context (such as the retry count) to aid in debugging when the unpause operation fails after maximum retries.
throw e;
lib/src/main/java/com/scalar/admin/kubernetes/Pauser.java:137
- Ensure that startTime and endTime are properly initialized before being used in the error message, as they may be null if pauseInternal fails before these values are set.
String unpauseFailedButBackupOkErrorMessage = String.format("Note that the pause operations for taking backup succeeded. You can use a backup that was taken during this pause duration: Start Time = %s, End Time = %s. ", startTime, endTime);
boolean backupOk = (pauseFailedException == null) && compareTargetSuccessful; | ||
|
||
// Create error message if any of the operations failed. | ||
StringBuilder errorMessageBuilder = new StringBuilder(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's cleaner and more consistent to create the error message in another method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. I agree with your suggestion.
I will update it. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 9e9132d.
return new PausedDuration(startTime, endTime); | ||
// If both the pause operation and status check succeeded, you can use the backup that was taken | ||
// during the pause duration. | ||
boolean backupOk = (pauseFailedException == null) && compareTargetSuccessful; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
boolean backupOk = (pauseFailedException == null) && compareTargetSuccessful; | |
boolean isPauseOk = (pauseFailedException == null) && compareTargetSuccessful; |
It's minor, but it's common to use isXXX
for methods returning a boolean.
Also, it only pauses, and it doesn't mean the actual backup is successful, so it's probably misleading to call backup
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Indeed, it does not mean Backup succeeded
. I will update the variable name.
Thank you for your suggestion!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 891beb8.
RequestCoordinator getRequestCoordinator(TargetSnapshot target) { | ||
return new RequestCoordinator( | ||
target.getPods().stream() | ||
.map(p -> new InetSocketAddress(p.getStatus().getPodIP(), target.getAdminPort())) | ||
.collect(Collectors.toList())); | ||
} | ||
|
||
@VisibleForTesting | ||
void pauseInternal( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better (more modular) to return PauseDuration
as a result instead of updating member variables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. It looks good!
I will update it. Thank you for your suggestion!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 744da27.
|
||
// Prepare error messages for each process. | ||
String unpauseErrorMessage = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should probably be defined as member variables since they don't need to be instantiated every execution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes. That's right. They are fixed values.
I will update it. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 9e9132d.
boolean compareTargetSuccessful; | ||
StatusCheckFailedException statusCheckFailedException; | ||
try { | ||
compareTargetSuccessful = compareTargetStatus(targetBeforePause, targetAfterPause); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For checking if two objects are the same, I think it's common to use xxxEquals.
compareXXX usually returns int to express non-equal cases.
compareTargetSuccessful = compareTargetStatus(targetBeforePause, targetAfterPause); | |
isTargetStatusEqual = targetStatusEquals(targetBeforePause, targetAfterPause); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see. I didn't know that.
I will update it. Thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in c233242.
// If both the pause operation and status check succeeded, you can use the backup that was taken | ||
// during the pause duration. | ||
boolean backupOk = (pauseFailedException == null) && compareTargetSuccessful; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's more readable if we identify the cases here like the following and create error messages for each case.
boolean isAllOK = ...
boolean isPauseOkUnpauseNotOk = ...
boolean isPauseNotOkUnpauseOk = ...
boolean isPauseNotOkUnpauseNotOk = ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. Indeed, your suggestion is more readable.
However, in the previous meeting, we (@komamitsu and I) discussed using boolean flags, and the result was Don't use flags (boolean variables) as long as possible to reduce status/variables so that we can make code simpler
. So, I used the exceptions in the condition of the if statement.
Honestly, it's a bit difficult for me to decide which is better because I think both (more readable and simpler) make sense to me...
@komamitsu
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
boolean isPauseOkUnpauseNotOk = ...
This may improve the readability. But it'd also introduce many states. So, I think it's a trade-off (or maybe matter of taste?).
The current code uses unpauseFailedException
, pauseFailedException
and compareTargetSuccessful
as more informative alternative to simple boolean flags. If we introduces new flags flags like isPauseNotOkUnpauseOkCompareTargetNotOk
(or state enum like enum PauseResult { SUCCESS, PAUSE_SUCCESS_AND_UNPAUSE_FAILED_AND_COMPARE_TARGET_SUCCESS, ... }
?), we need to manage 8 (2 ** 3) states and it would make the method complex. So, I suggested to separately manage the 3 states as a simple way.
@kota2and3kan I think either is fine with us. I'm not insisting on my suggestion. You can go with any option you feel most confident in and comfortable being responsible for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I got it.
I will consider which is better.
Thank you for your comment!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Co-authored-by: Hiroyuki Yamada <mogwaing@gmail.com>
@feeblefakie @komamitsu |
if (unpauseFailedException == null) { | ||
throw pauserException; | ||
} else { | ||
throw new UnpauseFailedException(UNPAUSE_ERROR_MESSAGE, pauserException); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code seems to drop the cause of unpauseFailedException
. How about just adding pauserException
to the unpauseFailedException
as a suppressed one?
// Run an unpause operation.
UnpauseFailedException unpauseFailedException = null;
try {
unpauseWithRetry(requestCoordinator, MAX_UNPAUSE_RETRY_COUNT);
} catch (Exception e) {
unpauseFailedException = new UnpauseFailedException(UNPAUSE_ERROR_MESSAGE, e);
}
// Get pods and deployment information after pause.
TargetSnapshot targetAfterPause;
try {
targetAfterPause = getTarget();
} catch (Exception e) {
PauserException pauserException =
new PauserException(GET_TARGET_AFTER_PAUSE_ERROR_MESSAGE, pauserException);
if (unpauseFailedException == null) {
throw pauserException;
} else {
unpauseFailedException.addSuppressed(e);
throw unpauseFailedException;
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@komamitsu
Ah, I see. Indeed, it drops the cause of unpauseFailedException
... I overlooked it...
I think this issue should be resolved on the buildException()
side, which we are discussing in another comment. So, I will fix it in the buildException()
process.
Thank you for pointing it out!
} | ||
|
||
return new PausedDuration(startTime, endTime); | ||
// Create an error message if any of the operations failed. | ||
String errorMessage = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[just an idea] Sorry for repeated minor suggestions. But, I just noticed adding additional exceptions to the primary exception might be an option so that all the detailed informations are preserved, although it would drop the benefit of outputting a fully concatenated error message in the first exception.
Like this:
...
Exception exception = buildException(pauseFailedException, unpauseFailedException,
statusCheckException, statusUnmatchedException);
if (exception != null) {
throw exception;
}
return pausedDuration;
}
@Nullable
private Exception buildException(
@Nullable PauseFailedException pauseFailedException,
@Nullable UnpauseFailedException unpauseFailedException,
@Nullable StatusCheckFailedException statusCheckFailedException,
@Nullable StatusUnmatchedException statusUnmatchedException) {
Exception exception = null;
if (unpauseFailedException != null) {
exception = unpauseFailedException;
}
if (pauseFailedException != null) {
if (exception == null) {
exception = pauseFailedException;
} else {
exception.addSuppressed(pauseFailedException);
}
}
if (statusCheckFailedException != null) {
if (exception == null) {
exception = statusCheckFailedException;
} else {
exception.addSuppressed(statusCheckFailedException);
}
}
if (statusUnmatchedException != null) {
if (exception == null) {
exception = statusUnmatchedException;
} else {
exception.addSuppressed(statusUnmatchedException);
}
}
return exception;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to the idea ;)
With this, API clients can decide what to do based on the primary + secondary(suppressed) exceptions, e.g., primary = UnpauseFailedException
, secondary = [StatusUnmatchedException]
.
@komamitsu Just wanted to confirm, you'd like to throw PauserExcpetion
instead of Exception
? 👀 Since all the exception types are subclasses of PauserExcpetion
(e.g., PauseFailedException extends PauserExcpetion
), I think it's ok to throw PauserExcpetion
in this case? 🙄 (Or maybe I'm missing something :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@monkey-mas Good point. Agreed, although StatusUnmatchedException
doesn't exist at the moment 😁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@komamitsu @monkey-mas
Thank you for your review and suggestions!
It looks good to me!
I'm fixing it now. Please give me a bit of time.
(I didn't know Throwable.addSuppressed()
. I was able to learn one new thing about Java!)
throw new UnpauseFailedException(errorMessage, unpauseFailedException); | ||
} else if (pauseFailedException != null) { // Pause Failed. | ||
// Pause Failed is second priority because pause issue might be caused by configuration error. | ||
throw new PauseFailedException(errorMessage, pauseFailedException); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first PauseFailedException
instance contains a PauseFailedException
instance as a cause. Is it expected? How about making pauseFailedException
just Exception not PauseFailedException
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@komamitsu
Ah, yes. You're right. I think we can throw the pauseFailedException
as is instead of new PauseFailedException()
.
I will fix it. I think we can fix this issue on the buildException()
side that we are discussing in another comment.
Thank you for pointing it out!
@komamitsu @monkey-mas |
@Nested | ||
class buildException { | ||
@Test | ||
void buildException_00000_ReturnNull() throws PauserException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To represent the combinations of arguments of buildException()
, I used binary numbers like 00000
, 00001
, 00010
, 00011
, 00100
, 00101
... In this test, 1
means the exception is not null, and 0
means the exception is null. For example:
00101
means:- UnpauseFailedException is null
- PauseFailedException is null
- GetTargetAfterPauseFailedException is
NOT
null - StatusCheckFailedException is null
- StatusUnmatchedException is
NOT
null
However, I don't know if this is good naming... So, I want to know some ideas about the test method naming, if you have some ideas. 😃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
binary numbers
00000
Good try :)
Although your strategy guarantees full coverage, the downside of it is the cost of maintainability 😓 Say, we'd add a new PauserException
s - the total number of tests becomes 2^6 (64!!!), which is a bit over control for us 😮
I'd say we could write some property-based tests for this so that we do't have to write that sort of full coverage tests. :-) i.e., instead of writing all the test cases, we can generate potential (random) test cases & assert them.
buildException
spec
IIUC, we are confident enough as long as we have the following tests scenarios in PauserTest#buildException
:
- Returns
null
when all the exceptions arenull
. - Returns
UnpauseFailedException
(+ suppressed exceptions) whereUnpauseFailedException
is notnull
. - Returns
PauseFailedException
(+ suppressed exceptions) where- a)
PauseFailedException
is notnull
. - b)
UnpauseFailedException
isnull
.
- a)
- Returns
GetTargetAfterPauseFailedException
(+ suppressed exceptions) where- a)
GetTargetAfterPauseFailedException
is notnull
- b) both
UnpauseFailedException
andPauseFailedException
arenull
.
- a)
- Returns
StatusCheckFailedException
(+ suppressed exceptions) where- a)
StatusCheckFailedException
is notnull
- b)
UnpauseFailedException
,PauseFailedException
andGetTargetAfterPauseFailedException
arenull
.
- a)
- Returns
StatusUnmatchedException
(+ suppressed exceptions) where- a)
StatusUnmatchedException
is notnull
- b) all the other
PauserException
arenull
.
- a)
PB test strategy
The spec indicates that we just need to write N+1
tests where N
is the number of exceptions, which could be better than 2^N
full-coverage tests due to maintainability :) This means that we don't have to write or maintain a lot of test cases even when we add a new PauserException
in the future!
How setup
(generate) test data?
- Randomly generate
PauserException
s for all the exceptions forbuildException(...)
.
- The main purpose of this is to generate
suppressed
(secondary) exceptions. - This could be done in
@BeaforeEach
- Generate and set a primary
PauserException
in each test case.
- Make sure that nullify unnecessary exception.
- For example, we must assign
null
toUnpauseFailedException
to obtainPauseFailedException
as a primary exception.
- Assert that an expected primary & secondary exceptions are obtained.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JUnit test code (just an idea)
@kota2and3kan @komamitsu
The following is my thoughts... Your feedback is appreciated :)
Config in Gradle
@kota2and3kan
Actually... I used AssertJ to execute assertThat(actual).contains(exccted)
and so on since it's handy for testing 😅 .
Please set up your Gradle as follows:
build.gradle
inlib
module
testImplementation("org.assertj:assertj-core:${assertjVersion}")
build.gradle
in top-level dir
assertjVersion = '3.25.3'
- Import AssertJ in
PauserTest
import static org.assertj.core.api.Assertions.assertThat;
Also, please import and use Random
in PauserTest
I used this to generate random exceptions 😅
private static final Random random = new Random();
JUnit
[Memo]
- I used
@Repeated(X)
to do property-based testing for randomly-generatedsuppressed
exceptions.X
is the number of PB random tests :) - Maybe we should use
assertThat(actual.getSuppressed()).containsExactlyInAnyOrderElementsOf(...)
to disregard the order ofsuppressed
exceptions.
@Nested
class buildException {
private Pauser pauser;
private UnpauseFailedException unpauseFailedException;
private PauseFailedException pauseFailedException;
private GetTargetAfterPauseFailedException getTargetAfterPauseFailedException;
private StatusCheckFailedException statusCheckFailedException;
private StatusUnmatchedException statusUnmatchedException;
@BeforeEach
void setUp() throws PauserException {
String namespace = "dummyNs";
String helmReleaseName = "dummyRelease";
pauser = new Pauser(namespace, helmReleaseName);
// Randomly generate exceptions.
String dummyMessage = "dummyMessage";
unpauseFailedException = random.nextBoolean() ? new UnpauseFailedException(dummyMessage) : null;
pauseFailedException = random.nextBoolean() ? new PauseFailedException(dummyMessage) : null;
getTargetAfterPauseFailedException = random.nextBoolean() ? new GetTargetAfterPauseFailedException(dummyMessage) : null;
statusCheckFailedException = random.nextBoolean() ? new StatusCheckFailedException(dummyMessage) : null;
statusUnmatchedException = random.nextBoolean() ? new StatusUnmatchedException(dummyMessage) : null;
}
@Nested
class whereAllExceptionsAreNull {
@Test
void returnsNull() {
// Arrange
unpauseFailedException = null;
pauseFailedException = null;
getTargetAfterPauseFailedException = null;
statusCheckFailedException = null;
statusUnmatchedException = null;
// Act
Exception actual =
pauser.buildException(
unpauseFailedException,
pauseFailedException,
getTargetAfterPauseFailedException,
statusCheckFailedException,
statusUnmatchedException);
// Assert
assertThat(actual).isNull();
}
}
@Nested
class whereUnpauseFailedExceptionIsSet {
List<PauserException> expectedSecondaryExceptions;
@BeforeEach
void setUp() {
unpauseFailedException = new UnpauseFailedException("Primary exception");
PauserException[] pauserExceptions = new PauserException[]{
pauseFailedException,
getTargetAfterPauseFailedException,
statusCheckFailedException,
statusUnmatchedException
};
expectedSecondaryExceptions = Arrays.stream(pauserExceptions).filter(Objects::nonNull).collect(Collectors.toList());
}
@RepeatedTest(20)
void returnsUnpausedFailedException() {
// Act
Exception actual =
pauser.buildException(
unpauseFailedException,
pauseFailedException,
getTargetAfterPauseFailedException,
statusCheckFailedException,
statusUnmatchedException);
// Assert
assertThat(actual).isInstanceOf(UnpauseFailedException.class);
assertThat(actual.getSuppressed()).containsAll(expectedSecondaryExceptions);
}
}
@Nested
class wherePauseFailedExceptionIsSet {
@Nested
class whenUnpauseFailedExceptionIsNull {
List<PauserException> expectedSecondaryExceptions;
@BeforeEach
void setUp() {
pauseFailedException = new PauseFailedException("Primary exception");
// Arrange a situation where UnpauseFailedException is null.
unpauseFailedException = null;
PauserException[] pauserExceptions = new PauserException[]{
getTargetAfterPauseFailedException,
statusCheckFailedException,
statusUnmatchedException
};
expectedSecondaryExceptions = Arrays.stream(pauserExceptions).filter(Objects::nonNull).collect(Collectors.toList());
}
@RepeatedTest(10)
void returnsPausedFailedException() {
// Act
Exception actual =
pauser.buildException(
unpauseFailedException,
pauseFailedException,
getTargetAfterPauseFailedException,
statusCheckFailedException,
statusUnmatchedException);
// Assert
assertThat(actual).isInstanceOf(PauseFailedException.class);
assertThat(actual.getSuppressed()).containsAll(expectedSecondaryExceptions);
}
}
}
@Nested
class whereGetTargetAfterPauseFailedExceptionIsSet {
@Nested
@DisplayName("where UnpauseFailedException and PauseFailedException are null")
class whenUnpauseFailedExceptionAndPauseFailedExceptionAreNull {
List<PauserException> expectedSecondaryExceptions;
@BeforeEach
void setUp() {
getTargetAfterPauseFailedException = new GetTargetAfterPauseFailedException("Primary exception");
// Arrange a situation where UnpauseFailedException and PauseFailedException are null.
unpauseFailedException = null;
pauseFailedException = null;
PauserException[] pauserExceptions = new PauserException[]{
statusCheckFailedException,
statusUnmatchedException
};
expectedSecondaryExceptions = Arrays.stream(pauserExceptions).filter(Objects::nonNull).collect(Collectors.toList());
}
@RepeatedTest(5)
void returnsGetTargetAfterPauseFailedException() {
// Act
Exception actual =
pauser.buildException(
unpauseFailedException,
pauseFailedException,
getTargetAfterPauseFailedException,
statusCheckFailedException,
statusUnmatchedException);
// Assert
assertThat(actual).isInstanceOf(GetTargetAfterPauseFailedException.class);
assertThat(actual.getSuppressed()).containsAll(expectedSecondaryExceptions);
}
}
}
@Nested
class whereStatusCheckFailedExceptionIsSet {
@Nested
@DisplayName("where UnpauseFailedException, PauseFailedException and GetTargetAfterPauseFailedException are null")
class whenUnpauseFailedExceptionAndPauseFailedExceptionAndGetTargetAfterPauseFailedExceptionAreNull {
List<PauserException> expectedSecondaryExceptions;
@BeforeEach
void setUp() {
statusCheckFailedException = new StatusCheckFailedException("Primary exception");
// Arrange a situation where UnpauseFailedException, PauseFailedException and GetTargetAfterPauseFailedException are null.
unpauseFailedException = null;
pauseFailedException = null;
getTargetAfterPauseFailedException = null;
PauserException[] pauserExceptions = new PauserException[]{statusUnmatchedException};
expectedSecondaryExceptions = Arrays.stream(pauserExceptions).filter(Objects::nonNull).collect(Collectors.toList());
}
@RepeatedTest(5)
void returnsStatusCheckFailedException() {
// Act
Exception actual =
pauser.buildException(
unpauseFailedException,
pauseFailedException,
getTargetAfterPauseFailedException,
statusCheckFailedException,
statusUnmatchedException);
// Assert
assertThat(actual).isInstanceOf(StatusCheckFailedException.class);
assertThat(actual.getSuppressed()).containsAll(expectedSecondaryExceptions);
}
}
}
@Nested
class whereStatusUnmatchedExceptionIsSet {
@Nested
@DisplayName("where all the other exceptions are null")
class whenAllOtherExceptionsAreNull {
@BeforeEach
void setUp() {
statusUnmatchedException = new StatusUnmatchedException("Primary exception");
// Arrange a situation where all the other exceptions are null.
unpauseFailedException = null;
pauseFailedException = null;
getTargetAfterPauseFailedException = null;
statusCheckFailedException = null;
}
@Test
void returns() {
// Act
Exception actual =
pauser.buildException(
unpauseFailedException,
pauseFailedException,
getTargetAfterPauseFailedException,
statusCheckFailedException,
statusUnmatchedException);
// Assert
assertThat(actual).isInstanceOf(StatusUnmatchedException.class);
assertThat(actual.getSuppressed()).isEmpty();
}
}
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@monkey-mas
Thank you for your review! This property-based test information is very helpful to me!
Totally, it looks good to me, but I want to progress this PR with the current unit test as is. This is because we prioritize the release of the Omnistrate service.
Therefore, I will fix the unit test in another PR in the future.
Anyway, thank you for your suggestion!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@monkey-mas Basically looks good to me! 👍
I think the number of repeats can be larger like up to 50 since it won't take so much time because of no wait.
BTW, I'm not sure whether the nested class name should start with a lowercase...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
up to 50
Just noticed it's close to the full combination(64) 😁
Maybe full coverage test is also okay using @ParameterizedTest
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback, folks :)
Therefore, I will fix the unit test in another PR in the future.
Yep, understood 👌
I think the number of repeats can be larger like up to 50 since it won't take so much time because of no wait.
Yeah, true. I'm open to use a large number as you suggested :)
For some tests, the number does not have to be that big and I just set a smaller number 😓 For instance, in StatusCheckFailedException
test, statusUnmatchedException
is the only randomly generated variable. (Thus, statistically speaking, 5
was enough 😓)
ref. Expected time to roll all 1 through 6 on a die
Anyway, you're right in the sense that it doesn't hurt even if we use 50
for that sort of unit test either :)
BTW, I'm not sure whether the nested class name should start with a lowercase...
Yeah, I sort of got your point. 😅
I actually don't follow Java naming convention here in unit tests 🙄 (eg a class name starts with a capital letter.)
- I usually create the following nested test class structure
- The nested class name starts with a lower letter to match a method name (which almost always starts with a lower case letter in Java).
- e.g.,
buildException
but notBuildException
to clarify the target method to test.
- and... I also apply the same rule for nested classes... 🤫 (except the top-level unit test name)
class UnitTest {
@Nested
class methodNameToTest {
// More nested classes or tests.
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. The naming is very difficult...
Anyway, thank you for your comments! I will refer to it when I create another PR that fixes unit tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you!
if (after != null && before.getStatus().equals(after.getStatus())) { | ||
return null; | ||
} else { | ||
return new StatusUnmatchedException(STATUS_UNMATCHED_ERROR_MESSAGE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[minor] How about adding before
and after
to the error message just in case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. It might have a lot of information that makes the error message very long, but I will check what is included in before
and after
, first. And, if the information is not too large, I will add them into the error message.
Thank you for your suggestion!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked what is included in TargetStatus
(i.e., before
and after
). As a result, I think we can add their information to the log, but in the current implementation, I think the information might not be readable a little.
So, I think it would be better to implement TargetStatus.toString()
first. However, I think we need to consider something before implementing it.
Therefore, I will keep the current implementation as is in this PR, and I will fix it in another PR in the future.
// Check if pods and deployment information are the same between before pause and after pause. | ||
StatusCheckFailedException statusCheckFailedException = null; | ||
StatusUnmatchedException statusUnmatchedException = null; | ||
try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about executing this block only when targetAfterPause != null
? The method seems to return StatusUnmatchedException
when the argument after
is null, but in this case comparing before
and after
itself isn't conducted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, yes. That's right. I will fix it. Thank you for the suggestion!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in 5cca44c.
// Return the final result based on each process. | ||
if (pauserException != null) { | ||
// Some operations failed. | ||
throw pauserException; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[minor] Should we have some log messages when pauseException != null
? 👀
Looks like we removed a log in this PR (Sorry if I miss some context here 😓)
logger.warn(
"Failed to unpause Scalar product. They are still in paused. You must restart related"
+ " pods by using the `kubectl rollout restart deployment {}`"
+ " command to unpase all pods.",
target.getDeployment().getMetadata().getName());
return;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need to output such a log message here.
I decided to include all messages in the generated exception.
Now, we can show users clearer errors by using several kinds of exceptions (e.g., UnpauseFailedException
).
Also, I wanted to see What happened
from the log, but now we can see all the messages in the exceptions via the stacktrace.
So, there is no problem if we don't have a log here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍 Let's create follow-up PRs if necessary once this base PR is merged :)
@komamitsu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you!
// for now. | ||
|
||
// Return the final result based on each process. | ||
if (unpauseFailedException != null) { // Unpause Failed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (unpauseFailedException != null) { // Unpause Failed. | |
if (unpauseFailedException != null) { // Unpause failed. |
if (unpauseFailedException != null) { // Unpause Failed. | ||
// Unpause issue is the most critical because it might cause system down. | ||
throw new UnpauseFailedException(errorMessage, unpauseFailedException); | ||
} else if (pauseFailedException != null) { // Pause Failed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} else if (pauseFailedException != null) { // Pause Failed. | |
} else if (pauseFailedException != null) { // Pause failed. |
// Unpause issue is the most critical because it might cause system down. | ||
throw new UnpauseFailedException(errorMessage, unpauseFailedException); | ||
} else if (pauseFailedException != null) { // Pause Failed. | ||
// Pause Failed is second priority because pause issue might be caused by configuration error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Pause Failed is second priority because pause issue might be caused by configuration error. | |
// Pause failure is the second priority because the pause issue might be caused by a configuration error. |
// Pause Failed is second priority because pause issue might be caused by configuration error. | ||
throw new PauseFailedException(errorMessage, pauseFailedException); | ||
} else if (!isTargetStatusEqual) { // Target status changed. | ||
// Target status changed is third priority because this issue might be caused by temporary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Target status changed is third priority because this issue might be caused by temporary | |
// Target status change is the third priority because this issue might be caused by temporary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I left some minor suggestions on the English comments.
} | ||
|
||
// Status check issue is third priority because this issue might be caused by temporary issue, | ||
// for example, targetAfterPause is null. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What causes targetAfterPause is null
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, sorry. This comment does not have meaningful information.
For example, if we cannot get the pod or deployment information by using the Kubernetes API, the targetAfterPause
will be null
, and enter this conditional branch.
I will fix this comment. Thank you for your question!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I left some suggestions.
Co-authored-by: Hiroyuki Yamada <mogwaing@gmail.com>
@feeblefakie |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thank you!
Description
This PR added error handling to the
Puaser.pause()
method.Previously, there is no error handling, and it might cause
Scalar products keep pause state forever
when some error occurs.In other words, it might cause some issues like the system down.
To address such issue, I added error handling. And now, if some error occurs, the
Pauser
tries to rununpause()
method several times to make Scalar products the unpause state as long as possible.Also, I updated it to throw the
PauseException
. So, the caller (CLI or some other tools that implement this library) can notice the above issue and handle it on their side.Please take a look!
Related issues and/or PRs
N/A
Changes made (By copilot)
This pull request introduces improvements to the
Pauser
class inlib/src/main/java/com/scalar/admin/kubernetes/Pauser.java
to enhance error handling, testability, and maintainability. It also includes a minor addition to the project's dependencies inlib/build.gradle
. The most important changes are grouped below by theme.Enhancements to error handling:
pause
method by adding nestedtry-catch
blocks to ensure that theunpauseWithRetry
method is called even if thepause
operation fails. This ensures a fallback mechanism for unpausing pods and provides detailed error messages for critical failures.unpauseWithRetry
method to throw aPauserException
with a detailed error message when retry attempts fail. This ensures that users are informed of the critical issue and the steps required to resolve it.Testability improvements:
MAX_UNPAUSE_RETRY_COUNT
,unpauseWithRetry
,getTarget
, andgetRequestCoordinator
members with@VisibleForTesting
to make them accessible for unit testing. This improves the testability of thePauser
class. [1] [2] [3]Dependency updates:
mockito-inline
library to thetestImplementation
dependencies inlib/build.gradle
to support mocking of static methods and final classes in tests.Checklist
Additional notes (optional)
N/A