Added indexed option for k8s jobs #597

sjawhar · 2022-06-01T23:29:03Z

Purpose

Parallel K8s jobs have two modes: Indexed and NonIndexed (default). Indexed jobs have some nice abilities, for example each job gets a JOB_COMPLETION_INDEX environment variable so it could assign itself work without needing a separate task queue. There's currently no way to specify the k8s completion mode.

Approach

Added an optional boolean indexed attribute

Example when set to false or ommitted:

Example when set to true

TODO

Add tests
Update docs

dacbd · 2022-06-01T23:40:47Z

Can you provide a main.tf you tested with/can be tested with/example?

Also thanks for the contribution!

sjawhar · 2022-06-01T23:44:27Z

Can you provide a main.tf you tested with/can be tested with/example?

First, create a persistent volume with ReadWriteMany. For example:

kind: PersistentVolume
apiVersion: v1
metadata:
  name: pv-storage
  labels:
    type: local
spec:
  capacity:
    storage: 4Gi
  accessModes:
    - ReadWriteMany
  hostPath:
    path: /tpi-data

Then use this main.tf

terraform {
  required_providers {
    iterative = {
      source = "github.com/iterative/iterative"
    }
  }
}
provider "iterative" {}

resource "iterative_task" "example" {
  cloud     = "k8s"
  machine   = "s"
  disk_size = 1

  parallelism = 3
  indexed     = false

  storage {
    workdir = "."
    output  = "results"
  }
  script = <<-END
    #!/bin/bash

    mkdir -p results
    printenv > results/out-$JOB_COMPLETION_INDEX
  END
}

casperdcl · 2022-06-02T10:41:37Z

related #585 /CC @0x2b3bfa0

0x2b3bfa0

Awesome feature, @sjawhar! 😍 It would be nice to enable IndexedCompletion automatically when parallelism is greater than 1 instead of exposing a separate option that has no equivalent on other providers.

0x2b3bfa0 · 2022-06-03T01:45:57Z

task/k8s/resources/resource_job.go

@@ -119,6 +121,13 @@ func (j *Job) Create(ctx context.Context) error {
 	jobCompletions := int32(j.Attributes.Task.Parallelism)
 	jobParallelism := int32(j.Attributes.Task.Parallelism)

+	var jobCompletionMode kubernetes_batch.CompletionMode
+	if j.Attributes.Indexed {


Suggested change

if j.Attributes.Indexed {

if j.Attributes.Parallelism > 1 {

task/k8s/resources/resource_job.go

task/common/values.go

iterative/resource_task.go

sjawhar · 2022-06-03T18:57:48Z

Awesome feature, @sjawhar! heart_eyes It would be nice to enable IndexedCompletion automatically when parallelism is greater than 1 instead of exposing a separate option that has no equivalent on other providers.

Thanks for the quick review! I also didn't like having an option that only applies to k8s, but I don't quite agree with your suggestion. Let me try to explain, though bear with me because I'm coming from a very cloud-heavy background and am relatively new to k8s.

K8s has two ways of running parallel jobs: indexed and nonindexed. Nonindexed means the jobs are identical, start up N, they do their thing. Indexed means that k8s expects a success for at least one job of each index 1-N. In other words, if job 3 fails, it'll keep retrying 3 specifically. They're useful for different things: you might use nonindexed to spawn N workers to accomplish M jobs by reading from a task queue, whereas you might use indexed to accomplish the same thing without the queue (e.g. by giving the full task list to each job, and then it assigns itself work based on its index).

tl;dr: both indexed and nonindexed are useful ways of running parallel jobs in k8s, and I don't think the user should be forced to use one or the other.

Now here's an extra twist: to accomplish the indexed example I described above (N workers for M jobs, where M > N), we need to be able to set .spec.completions and .spec.parallelism separately. So we actually need to add another k8s-specific option 😅.

Thoughts on all this?

UPDATE: What if we removed indexed and added completions, and set CompletionMode to indexed if completions is set. That way there's only one k8s-specific option 😺

dacbd · 2022-06-03T20:06:09Z

if we could make index less k8s specific like somehow populate an env like TASK_COMPLETION_INDEX = [1-3] basically reproducing the k8s JOB_COMPLETION_INDEX behavior for the cloud instances 🤔

sjawhar · 2022-06-04T17:27:55Z

I added the completions functionality as described in my last comment. This just reflects what works for my use case, I'm happy to tear apart and refactor to make more generic, as you have both described. Also, please forgive my Go, this might be precisely the second time I've ever used it 😅

0x2b3bfa0 · 2022-06-04T18:48:34Z

For the use cases we¹ have in mind, the following should be enough:

CompletionMode equals to NonIndexed if parallelism is 1 else Indexed
Completions equals to parallelism
Parallelism equals to parallelism

The following scenarios are still not supported:

Run $m$ different tasks on $n$ nodes, where $m \neq n$
Run $n$ identical workers taking tasks from an external queue

Out of curiosity, can you share a bit more of information on your use case? I guess that you may want to restrict the parallelism so running a massively parallel task doesn't exhaust your cluster's resources. 🤔

🔔 @iterative/cml for sanity ↩

sjawhar · 2022-06-04T19:22:28Z

CompletionMode equals to NonIndexed if parallelism is 1 else Indexed

I think you still might not be grokking the purpose of non/indexed. They're both valid modes of parallel operation, it's not about parallelism = 1 or not.

Out of curiosity, can you share a bit more of information on your use case? I guess that you may want to restrict the parallelism so running a massively parallel task doesn't exhaust your cluster's resources. 🤔

You got it. If I need to run 100 things, I don't want to spawn 100 pods. This seems to me like a necessary consideration for supporting on-prem/k8s.

As you can see from the diff, this is a very minor change that opens up a lot of functionality in k8s. I'd be grateful if you'd reconsider the pros/cons here. 🙏

0x2b3bfa0 · 2022-06-05T16:08:20Z

If I need to run 100 things, I don't want to spawn 100 pods

What you propose is similar to jobs.<job_id>.strategy.max-parallel on GitHub Actions and could be useful, although supporting it outside k8s is significantly more complex.

As per your suggestion, the current parallelism argument should be replaced by two different arguments:

Number of times to run a task (i.e. completions in the Kubernetes parlance)
Maximum number of concurrent workers (i.e. parallelism in the Kubernetes parlance)

By default, 2 should be equal to 1 if not specified by users.

0x2b3bfa0 · 2022-06-05T20:59:54Z

I think you still might not be grokking the purpose of non/indexed. They're both valid modes of parallel operation, it's not about parallelism = 1 or not.

Sorry, I explained myself badly. 🙈

Traditionally, Kubernetes Jobs didn't have an Indexed mode, and were meant to run exactly the same code one or more times. Using an external task queue was the officially recommended way of running different tasks.

This project tries to simplify the user's workflow as much as possible: the Indexed mode eliminates the need for an external task queue in those cases when the total number of tasks is known in advance, but still allows users to differentiate every index.

My insistence of using NonIndexed mode when parallelism is 1 is just to support Kubernetes versions prior 1.24 if users don't need this feature. Requiring a single completion renders this feature useless, so we can safely disable it; when requiring more completions, it will be always enabled.

omesser · 2022-06-09T13:04:29Z

@iterative/cml
So, assuming:
CompletionsMode can be inferred implicitly, and going with the latest suggestion.

How do we want to handle something like (user provides k8s specific field when not using k8s):

resource "iterative_task" "example" {
  cloud     = "aws"
  ...
  parallelism = 3
  completions = 5
 }

Error: "completions not supported for non-k8s run-time"? (I am allergic to referring to k8s as a cloud provider 😓 )
We definitely don't have to support a task-queue like logic at this point IMO. We're stepping dangerously into "implementing scheduler" territory.

Considering the above, I have to point out the very likely fact that more and more k8s-specific switches will probably be requested and added in the future, and maybe some cloud-provider specific ones as well - some that it won't be natural to abstract away or support across providers neatly.
My personal opinion is that we need to keep those as true as possible to the original behavior (=same name, some provider/run-time specific block? maybe nested under "advanced"?) so users can easily infer their behavior from the original docs without inventing ad-hoc abstractions for them that aren't natural. Thoughts on this?

casperdcl · 2022-06-10T17:09:51Z

introducing a k8s-specific config block makes sense, esp. since it's not-a-cloud:

resource "iterative_task" "example" {
  cloud       = "k8s"
  parallelism = ... # warn iff defined on `cloud = "k8s"`?

  # k8s-specific block
  k8s {
    parallelism = 3 # in the k8s sense, not in the TPI sense
    completions = 5
  }
}

0x2b3bfa0 · 2022-06-13T12:34:55Z

I'd avoid introducing backend-specific details, especially when the outer parallelism is functionally equivalent to the inner completions and the inner parallelism has no equivalent in other backends.

0x2b3bfa0 · 2022-06-13T12:43:59Z

@sjawhar, would it be possible to keep the original scope of this pull request (enable indexed mode when Kubernetes completions > 1) and open a separate pull request for the addition of a separate parallelism limit?

sjawhar · 2022-06-13T15:51:48Z

@sjawhar, would it be possible to keep the original scope of this pull request (enable indexed mode when Kubernetes completions > 1) and open a separate pull request for the addition of a separate parallelism limit?

I really don't think that's a good idea. Users should not be forced to use indexed mode. I went that route initially just to make it work for my use case as quickly as possible, but this would be both a change from the current behavior and unnecessarily limiting. Are you sure you want to do that?

0x2b3bfa0 · 2022-06-13T22:27:00Z

this would be both a change from the current behavior and unnecessarily limiting

I thought that enabling the indexed mode doesn't have any significant effect beyond exposing a couple of additional environment variables. Which kind of unnecessary limitations do you foresee?

sjawhar · 2022-06-14T01:32:17Z

I thought that enabling the indexed mode doesn't have any significant effect beyond exposing a couple of additional environment variables. Which kind of unnecessary limitations do you foresee?

It's not just about environment variables. From the docs:

NonIndexed (default): the Job is considered complete when there have been .spec.completions successfully completed Pods. In other words, each Pod completion is homologous to each other.

Indexed: the Pods of a Job get an associated completion index from 0 to .spec.completions-1. The Job is considered complete when there is one successfully completed Pod for each index.

These are two very different behaviors. Your suggestion work perfectly well for my use case (Indexed is what I need), but it seems unnecessarily limiting. The spec already has provider-specific flags (e.g. region does not apply to k8s), so we're not breaking new ground by adding another one.

Nonetheless, if you're sure then I'll make the change.

0x2b3bfa0 · 2022-06-15T19:28:44Z

Your suggestion work perfectly well for my use case (Indexed is what I need), but it seems unnecessarily limiting.

Indexed mode is precisely the kind of behavior this tool is meant to have; see #585. Unless there is any practical use case that requires the NonIndexed mode, it looks like there is no pressing need to support both.

The spec already has provider-specific flags (e.g. region does not apply to k8s), so we're not breaking new ground by adding another one.

Ouch! 😅 Indeed, although there were plans to “rename region to location so it's valid for zones and node selectors” as per #412 (comment)

omesser · 2022-06-19T13:42:11Z

@0x2b3bfa0

Indexed mode is precisely the kind of behavior this tool is meant to have; see #585. Unless there is any practical use case that requires the NonIndexed mode, it looks like there is no pressing need to support both.

I'm with @sjawhar on this one - it makes no sense taking away this choice for the perceived "simplicity" of merely saving a field here (1:1 with k8s field, explainability & docs built in). If we will get here the power of NonIndexed for the case of k8s only - that's still valuable IMO. No need to abuse terminology or over generalize - this should be in a k8s-specific block imo (yes, exposing backend specific details), no need to try and provide same functionality/choice across all other clouds/engines - that would be a broken abstraction.
The key to keep our code powerful and still simple, meaningful and maintainable, IMO, is this - once something is not easily/directly generalizable - keep it provider specific, help the user transition to k8s terminology

"rename region to location so it's valid for zones and node selectors"

I'd advise against 😨 (commented on #412 )

0x2b3bfa0 · 2022-06-19T13:56:36Z

the power of NonIndexed

Citation needed. 😅 If we provide “powerful” options it should be for a reason, methinks.

0x2b3bfa0 · 2022-06-19T13:59:38Z

this should be in a k8s-specific block imo (yes, exposing backend specific details), no need to try and provide same functionality/choice across all other clouds/engines - that would be a broken abstraction.

We may end up exposing backend-specific details when shoehorning them under a common specification is proven to be difficult or impractical. 👍🏼

Although I wonder if this is the case.

0x2b3bfa0 · 2022-06-19T14:02:34Z

The key to keep our code powerful and still simple, meaningful and maintainable, IMO, is this - once something is not easily/directly generalizable - keep it provider specific, help the user transition to k8s terminology

When terminology is [susceptible of being] generalizable and different backends overload the same names with different meanings, I believe that it makes sense to run the extra mile and “generalize” on our side a bit more.

I'd argue that this particular functionality should be generalized, although in some other cases (e.g. placement) we end up resorting to specific options.

0x2b3bfa0 · 2022-06-19T14:07:54Z

taking away this choice for the perceived "simplicity" of merely saving a field here

Taking away some choices isn't implicitly negative; quite the opposite. Unless we find any supported use case that benefits particularly from NonIndexed mode, I'd consider not exposing that choice to users.

omesser · 2022-06-23T15:15:11Z

We discussed this offline a bit, and specifically - I find it hard to defend the choice for NoneIndexed, except for the religious reason of "give me ma k8s options!"
So I would say we can live without it for now

sjawhar · 2022-06-23T19:08:19Z

Ok, I'll make the changes sometime in the next few days

redabuspatrol · 2022-07-13T18:19:56Z

I came across this PR while looking for this feature with AWS EC2. I think the ability to operate parallel instances with regular cloud providers and have some sort of indexing, or any mechanism, to dispatch work to the different instances can greatly help small teams and individual developers who don't have resources to manage k8s.

0x2b3bfa0 · 2022-07-14T09:00:19Z

@redabuspatrol, can you please comment on #585? In the meanwhile, you may also want to try having a separate task for every chunk of work.

…onIndexed

sjawhar · 2022-08-16T20:43:11Z

Sorry for the long delay. I've reverted most of the changes, and implemented @0x2b3bfa0 's preferred solution.

0x2b3bfa0 · 2022-08-17T23:35:13Z

Thank you very much, @sjawhar! 🙏🏼

casperdcl · 2022-08-19T17:25:15Z

Thanks so much @sjawhar! Sorry for the delay and (perhaps excessive) hesitancy from our end too.

The reverted changes still look worth pursuing to us, just in separate PRs. We'd probably also be much quicker at understanding & merging said PRs if you could share an example of their use too... I assume related to things like master...sjawhar:terraform-provider-iterative:feature/nfs-volume and iterative/dvc@main...sjawhar:dvc:feature/parallel-repro?

sjawhar · 2022-08-20T16:10:07Z

I can open separate issues for each of the following features

Ability to set completions separately from parallelism for k8s jobs (the feature that was rolled back in this PR)
Ability to mount NFS volumes to k8s jobs
Using the region field to set node selectors for k8s jobs
A fix for how controller-uid is queried in k8s jobs. The current method breaks when you add custom tags/labels to the jobs.

From the links above, you'll see I've already implemented all of these in the feature/nfs-volume branch, which I'm using for a project at work. Would be great to get all of them upstream, and I think the only sticking point is how to represent cloud-specific options in Terraform file.

sjawhar · 2022-09-05T23:17:13Z

Broke out the issues/fixes

sjawhar had a problem deploying to manual June 1, 2022 23:29 Error

dacbd requested a review from 0x2b3bfa0 June 1, 2022 23:49

casperdcl added the external-request You asked, we did label Jun 2, 2022

0x2b3bfa0 reviewed Jun 3, 2022

View reviewed changes

sjawhar temporarily deployed to manual June 4, 2022 17:25 Inactive

sjawhar temporarily deployed to automatic June 4, 2022 18:05 Inactive

sjawhar force-pushed the feature/k8s-completion-mode branch from 6f5ab2a to 174f56b Compare June 23, 2022 18:14

sjawhar had a problem deploying to manual June 23, 2022 18:14 Error

sjawhar had a problem deploying to manual June 23, 2022 18:19 Error

redabuspatrol mentioned this pull request Jul 14, 2022

parallel: id & examples #585

Open

sjawhar added 3 commits August 16, 2022 20:24

Added indexed option for k8s jobs

87eb03d

Use completions instead of indexed

37d343c

I'm bad at go

f6d569b

sjawhar force-pushed the feature/k8s-completion-mode branch from ae37043 to f6d569b Compare August 16, 2022 20:27

sjawhar had a problem deploying to manual August 16, 2022 20:27 Error

Remove k8s completions option: use Indexed if parallelism > 1, else N…

6d14db2

…onIndexed

sjawhar temporarily deployed to manual August 16, 2022 20:42 Inactive

sjawhar temporarily deployed to automatic August 16, 2022 21:35 Inactive

sjawhar had a problem deploying to automatic August 16, 2022 21:35 Failure

sjawhar temporarily deployed to automatic August 16, 2022 21:35 Inactive

0x2b3bfa0 approved these changes Aug 17, 2022

View reviewed changes

0x2b3bfa0 merged commit bfca950 into iterative:master Aug 17, 2022

sjawhar deleted the feature/k8s-completion-mode branch August 20, 2022 16:07

sjawhar mentioned this pull request Sep 5, 2022

Support setting completions to a different value than parallelism for k8s tasks #659

Open

Added indexed option for k8s jobs #597

Added indexed option for k8s jobs #597

Uh oh!

Conversation

sjawhar commented Jun 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Approach

TODO

Uh oh!

dacbd commented Jun 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sjawhar commented Jun 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

casperdcl commented Jun 2, 2022

Uh oh!

0x2b3bfa0 left a comment

Choose a reason for hiding this comment

Uh oh!

0x2b3bfa0 Jun 3, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sjawhar commented Jun 3, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dacbd commented Jun 3, 2022

Uh oh!

sjawhar commented Jun 4, 2022

Uh oh!

0x2b3bfa0 commented Jun 4, 2022

Footnotes

Uh oh!

sjawhar commented Jun 4, 2022

Uh oh!

0x2b3bfa0 commented Jun 5, 2022

Uh oh!

0x2b3bfa0 commented Jun 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

omesser commented Jun 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

casperdcl commented Jun 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

0x2b3bfa0 commented Jun 13, 2022

Uh oh!

0x2b3bfa0 commented Jun 13, 2022

Uh oh!

sjawhar commented Jun 13, 2022

Uh oh!

0x2b3bfa0 commented Jun 13, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sjawhar commented Jun 14, 2022

Uh oh!

0x2b3bfa0 commented Jun 15, 2022

Uh oh!

omesser commented Jun 19, 2022

Uh oh!

0x2b3bfa0 commented Jun 19, 2022

Uh oh!

0x2b3bfa0 commented Jun 19, 2022

Uh oh!

0x2b3bfa0 commented Jun 19, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

0x2b3bfa0 commented Jun 19, 2022

Uh oh!

omesser commented Jun 23, 2022

Uh oh!

sjawhar commented Jun 23, 2022

Uh oh!

sjawhar commented Jun 1, 2022 •

edited

Loading

dacbd commented Jun 1, 2022 •

edited

Loading

sjawhar commented Jun 1, 2022 •

edited

Loading

sjawhar commented Jun 3, 2022 •

edited

Loading

0x2b3bfa0 commented Jun 5, 2022 •

edited

Loading

omesser commented Jun 9, 2022 •

edited

Loading

casperdcl commented Jun 10, 2022 •

edited

Loading

0x2b3bfa0 commented Jun 13, 2022 •

edited

Loading

0x2b3bfa0 commented Jun 19, 2022 •

edited

Loading

sjawhar commented Aug 20, 2022 •

edited

Loading