Skip to content

[vpj] Push Job Timeout #1786

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
May 22, 2025
Merged
Changes from 1 commit
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
8ebedc2
Removed old `bootstrapToOnlineOnlineTimeoutInHours` usage since it's …
KaiSernLim May 6, 2025
dddea75
First attempt at adding a timeout to cancel the job. 🌗
KaiSernLim May 6, 2025
82a2c2d
Implement timeout handling for VenicePushJob and add unit test for ti…
KaiSernLim May 6, 2025
cf391b1
Two unit tests for verification. 🚏
KaiSernLim May 9, 2025
387ff49
Fixed the unit test to actually test something meaningful. 🎤
KaiSernLim May 9, 2025
12902db
Fixed NPE from `storeResponse` not being populated. 🦆
KaiSernLim May 9, 2025
3d1f33d
Fixed Spotbugs warning in test. 🥱
KaiSernLim May 9, 2025
7cc99a3
Fixed `controllerClient` NPE. 🐻‍❄️
KaiSernLim May 10, 2025
e02f96c
Refactored `timeoutExecutor` to be a member variable. 🫩
KaiSernLim May 12, 2025
210c28c
Minor review changes by Nisarg. 🪡
KaiSernLim May 20, 2025
eac4a66
Try setting `bootstrapToOnlineTimeoutInHours` directly, instead of re…
KaiSernLim May 21, 2025
5540f7f
Revert "Try setting `bootstrapToOnlineTimeoutInHours` directly, inste…
KaiSernLim May 21, 2025
2f9f9fe
Try changing the condition in the test. 😪
KaiSernLim May 21, 2025
07d8ab9
Only check `KILLED` status in unit test. 🤖
KaiSernLim May 21, 2025
eca6ef3
I can't tell which branches the unit test is going into in CI. 😭
KaiSernLim May 22, 2025
74da931
Is the latch even counting down? 😭
KaiSernLim May 22, 2025
e96fd4a
Is the latch even counting down? 😭
KaiSernLim May 22, 2025
e305f09
Try reordering the unit tests? 😭
KaiSernLim May 22, 2025
6da0bcd
It was the `configure()` taking a long time, wasn't it? 🤪
KaiSernLim May 22, 2025
545f841
Added the finishing touches.
KaiSernLim May 22, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,6 @@
import static org.mockito.ArgumentMatchers.anyLong;
import static org.mockito.ArgumentMatchers.anyString;
import static org.mockito.ArgumentMatchers.eq;
import static org.mockito.Mockito.atLeast;
import static org.mockito.Mockito.doAnswer;
import static org.mockito.Mockito.doCallRealMethod;
import static org.mockito.Mockito.doNothing;
Expand Down Expand Up @@ -292,7 +291,7 @@
Properties props = getVpjRequiredProperties();
props.put(KEY_FIELD_PROP, "id");
props.put(VALUE_FIELD_PROP, "name");
props.put(DATA_WRITER_COMPUTE_JOB_CLASS, dataWriterJobClass.getName());
props.put(DATA_WRITER_COMPUTE_JOB_CLASS, dataWriterJobClass.getCanonicalName());
ControllerClient client = getClient();
JobStatusQueryResponse response = mock(JobStatusQueryResponse.class);
doReturn("UNKNOWN").when(response).getStatus();
Expand All @@ -313,8 +312,8 @@
} catch (VeniceException e) {
Assert.assertTrue(e.getMessage().contains("push job is still in unknown state."));
}
verify(pushJob, atLeast(1)).cancel();
verify(pushJob, atLeast(1)).killDataWriterJob();
verify(pushJob, times(1)).cancel();
verify(pushJob, times(2)).killDataWriterJob();
}
}

Expand Down Expand Up @@ -348,7 +347,7 @@
/*
* 1. Data writer job starts and status is set to RUNNING.
* 2. Timeout thread kills the data writer job and status is set to KILLED.
* The latch is used to stall the validateJob() method until the data writer job is killed.
* The latch is used to stall the runComputeJob() method until the data writer job is killed.
*/
Answer<Void> stallDataWriterJob = invocation -> {
// At this point, the data writer job status is already set to RUNNING.
Expand Down Expand Up @@ -379,8 +378,8 @@
} catch (VeniceException e) {
// Expected, because the data writer job is not configured to run successfully in this unit test environment
}
verify(pushJob, atLeast(1)).cancel();
verify(pushJob, atLeast(1)).killDataWriterJob();
verify(pushJob, times(1)).cancel();
verify(pushJob, times(2)).killDataWriterJob();

Check failure on line 382 in clients/venice-push-job/src/test/java/com/linkedin/venice/hadoop/VenicePushJobTest.java

View workflow job for this annotation

GitHub Actions / Clients / UT & CodeCov (17)

VenicePushJobTest.testDataWriterComputeJobTimeout[1](class com.linkedin.venice.spark.datawriter.jobs.DataWriterSparkJob)

org.mockito.exceptions.verification.TooFewActualInvocations: venicePushJob.killDataWriterJob(); Wanted 2 times: -> at com.linkedin.venice.hadoop.VenicePushJobTest.testDataWriterComputeJobTimeout(VenicePushJobTest.java:382) But was 1 time: -> at com.linkedin.venice.hadoop.VenicePushJobTest.lambda$testDataWriterComputeJobTimeout$5(VenicePushJobTest.java:365)

Check failure on line 382 in clients/venice-push-job/src/test/java/com/linkedin/venice/hadoop/VenicePushJobTest.java

View workflow job for this annotation

GitHub Actions / Clients / UT & CodeCov (8)

VenicePushJobTest.testDataWriterComputeJobTimeout[1](class com.linkedin.venice.spark.datawriter.jobs.DataWriterSparkJob)

org.mockito.exceptions.verification.TooFewActualInvocations: venicePushJob.killDataWriterJob(); Wanted 2 times: -> at com.linkedin.venice.hadoop.VenicePushJobTest.testDataWriterComputeJobTimeout(VenicePushJobTest.java:382) But was 1 time: -> at com.linkedin.venice.hadoop.VenicePushJobTest.lambda$testDataWriterComputeJobTimeout$5(VenicePushJobTest.java:365)

Check failure on line 382 in clients/venice-push-job/src/test/java/com/linkedin/venice/hadoop/VenicePushJobTest.java

View workflow job for this annotation

GitHub Actions / Clients / UT & CodeCov (8)

VenicePushJobTest.testDataWriterComputeJobTimeout[1](class com.linkedin.venice.spark.datawriter.jobs.DataWriterSparkJob)

org.mockito.exceptions.verification.TooFewActualInvocations: venicePushJob.killDataWriterJob(); Wanted 2 times: -> at com.linkedin.venice.hadoop.VenicePushJobTest.testDataWriterComputeJobTimeout(VenicePushJobTest.java:382) But was 1 time: -> at com.linkedin.venice.hadoop.VenicePushJobTest.lambda$testDataWriterComputeJobTimeout$5(VenicePushJobTest.java:365)

Check failure on line 382 in clients/venice-push-job/src/test/java/com/linkedin/venice/hadoop/VenicePushJobTest.java

View workflow job for this annotation

GitHub Actions / Clients / UT & CodeCov (11)

VenicePushJobTest.testDataWriterComputeJobTimeout[1](class com.linkedin.venice.spark.datawriter.jobs.DataWriterSparkJob)

org.mockito.exceptions.verification.TooFewActualInvocations: venicePushJob.killDataWriterJob(); Wanted 2 times: -> at com.linkedin.venice.hadoop.VenicePushJobTest.testDataWriterComputeJobTimeout(VenicePushJobTest.java:382) But was 1 time: -> at com.linkedin.venice.hadoop.VenicePushJobTest.lambda$testDataWriterComputeJobTimeout$5(VenicePushJobTest.java:365)
assertEquals(pushJob.getDataWriterComputeJob().getStatus(), DataWriterComputeJob.Status.KILLED);
}
}
Expand Down
Loading