Skip to content

[AN-557] Include VM initialization time in cost calculation for Batch jobs #7744

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 20, 2025

Conversation

salonishah11
Copy link
Contributor

Description

Jira: https://broadworkbench.atlassian.net/browse/AN-557

This PR attempts to fix the underestimation of cost for Batch jobs. On Batch, users incur cost when VM is being initialized. Hence use corresponding events for vmStartTime and vmEndTime for cost calculations.

Events for a job run locally:
Screenshot 2025-05-19 at 9 30 28 AM

vmStartTime now corresponds to SCHEDULED state:
Screenshot 2025-05-19 at 9 32 34 AM

Release Notes Confirmation

CHANGELOG.md

  • I updated CHANGELOG.md in this PR
  • I assert that this change shouldn't be included in CHANGELOG.md because it doesn't impact community users

Terra Release Notes

  • I added a suggested release notes entry in this Jira ticket
  • I assert that this change doesn't need Jira release notes because it doesn't impact Terra users

@salonishah11 salonishah11 requested a review from a team as a code owner May 19, 2025 13:33
val endedRegex = ".*RUNNING to.*".r // can be SUCCEEDED or FAILED
// on Batch, when job transitions to SCHEDULED state it indicates that the VM is being initialized. Users are billed for this
// startup time. Hence, the 'vmStartTime' corresponds to when the job enters the SCHEDULED state.
val startedRegex = ".*QUEUED to SCHEDULED.*".r
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance we might be transitioning to scheduled from a state other than QUEUED?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding from conversation with Batch team was that SCHEDULED only happens after QUEUED.

It seems the only other state that can exist before QUEUED is STATE_UNSPECIFIED but I am not sure if job can go from STATE_UNSPECIFIED -> SCHEDULED 🤔
https://cloud.google.com/batch/docs/reference/rest/v1/projects.locations.jobs#State

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want the start regex to be ".*to SCHEDULED.*" to be safe?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That does seem a little safer, but I'm good either way based on your previous comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will update the regex just to be safe. The important part here is reaching the SCHEDULED state which would be achieved by ".*to SCHEDULED.*" as well.

@@ -238,13 +251,44 @@ class BatchRequestExecutorSpec
result.status match {
case RunStatus.Failed(_, events, _) =>
val eventNames = events.map(_.name)
val eventTimes = events.map(_.offsetDateTime.toString)
println(eventNames)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this is a debug line that should be removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Maybe this is a leftover from previous PR. The test do succeed without the println.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, yes, right, I got confused and though it was being added here 🤦‍♀️

@salonishah11 salonishah11 merged commit 8f96cfa into develop May 20, 2025
42 checks passed
@salonishah11 salonishah11 deleted the sps_fix_batch_cost_bug branch May 20, 2025 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants