Skip to content

Added indexed option for k8s jobs #597

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Aug 17, 2022

Conversation

sjawhar
Copy link
Contributor

@sjawhar sjawhar commented Jun 1, 2022

Purpose

Parallel K8s jobs have two modes: Indexed and NonIndexed (default). Indexed jobs have some nice abilities, for example each job gets a JOB_COMPLETION_INDEX environment variable so it could assign itself work without needing a separate task queue. There's currently no way to specify the k8s completion mode.

Approach

Added an optional boolean indexed attribute

Example when set to false or ommitted:
image

Example when set to true
image

TODO

  • Add tests
  • Update docs

@dacbd
Copy link
Contributor

dacbd commented Jun 1, 2022

Can you provide a main.tf you tested with/can be tested with/example?

Also thanks for the contribution!

@sjawhar
Copy link
Contributor Author

sjawhar commented Jun 1, 2022

Can you provide a main.tf you tested with/can be tested with/example?

First, create a persistent volume with ReadWriteMany. For example:

kind: PersistentVolume
apiVersion: v1
metadata:
  name: pv-storage
  labels:
    type: local
spec:
  capacity:
    storage: 4Gi
  accessModes:
    - ReadWriteMany
  hostPath:
    path: /tpi-data

Then use this main.tf

terraform {
  required_providers {
    iterative = {
      source = "github.com/iterative/iterative"
    }
  }
}
provider "iterative" {}

resource "iterative_task" "example" {
  cloud     = "k8s"
  machine   = "s"
  disk_size = 1

  parallelism = 3
  indexed     = false

  storage {
    workdir = "."
    output  = "results"
  }
  script = <<-END
    #!/bin/bash

    mkdir -p results
    printenv > results/out-$JOB_COMPLETION_INDEX
  END
}

@dacbd dacbd requested a review from 0x2b3bfa0 June 1, 2022 23:49
@casperdcl
Copy link
Contributor

related #585 /CC @0x2b3bfa0

@casperdcl casperdcl added the external-request You asked, we did label Jun 2, 2022
Copy link
Member

@0x2b3bfa0 0x2b3bfa0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome feature, @sjawhar! 😍 It would be nice to enable IndexedCompletion automatically when parallelism is greater than 1 instead of exposing a separate option that has no equivalent on other providers.

@@ -119,6 +121,13 @@ func (j *Job) Create(ctx context.Context) error {
jobCompletions := int32(j.Attributes.Task.Parallelism)
jobParallelism := int32(j.Attributes.Task.Parallelism)

var jobCompletionMode kubernetes_batch.CompletionMode
if j.Attributes.Indexed {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if j.Attributes.Indexed {
if j.Attributes.Parallelism > 1 {

@sjawhar
Copy link
Contributor Author

sjawhar commented Jun 3, 2022

Awesome feature, @sjawhar! heart_eyes It would be nice to enable IndexedCompletion automatically when parallelism is greater than 1 instead of exposing a separate option that has no equivalent on other providers.

Thanks for the quick review! I also didn't like having an option that only applies to k8s, but I don't quite agree with your suggestion. Let me try to explain, though bear with me because I'm coming from a very cloud-heavy background and am relatively new to k8s.

K8s has two ways of running parallel jobs: indexed and nonindexed. Nonindexed means the jobs are identical, start up N, they do their thing. Indexed means that k8s expects a success for at least one job of each index 1-N. In other words, if job 3 fails, it'll keep retrying 3 specifically. They're useful for different things: you might use nonindexed to spawn N workers to accomplish M jobs by reading from a task queue, whereas you might use indexed to accomplish the same thing without the queue (e.g. by giving the full task list to each job, and then it assigns itself work based on its index).

tl;dr: both indexed and nonindexed are useful ways of running parallel jobs in k8s, and I don't think the user should be forced to use one or the other.

Now here's an extra twist: to accomplish the indexed example I described above (N workers for M jobs, where M > N), we need to be able to set .spec.completions and .spec.parallelism separately. So we actually need to add another k8s-specific option 😅.

Thoughts on all this?

UPDATE: What if we removed indexed and added completions, and set CompletionMode to indexed if completions is set. That way there's only one k8s-specific option 😺

@dacbd
Copy link
Contributor

dacbd commented Jun 3, 2022

if we could make index less k8s specific like somehow populate an env like TASK_COMPLETION_INDEX = [1-3] basically reproducing the k8s JOB_COMPLETION_INDEX behavior for the cloud instances 🤔

@sjawhar sjawhar temporarily deployed to manual June 4, 2022 17:25 Inactive
@sjawhar
Copy link
Contributor Author

sjawhar commented Jun 4, 2022

I added the completions functionality as described in my last comment. This just reflects what works for my use case, I'm happy to tear apart and refactor to make more generic, as you have both described. Also, please forgive my Go, this might be precisely the second time I've ever used it 😅

@sjawhar sjawhar temporarily deployed to automatic June 4, 2022 18:05 Inactive
@sjawhar sjawhar temporarily deployed to automatic June 4, 2022 18:05 Inactive
@sjawhar sjawhar temporarily deployed to automatic June 4, 2022 18:05 Inactive
@sjawhar sjawhar temporarily deployed to automatic June 4, 2022 18:05 Inactive
@0x2b3bfa0
Copy link
Member

For the use cases we1 have in mind, the following should be enough:

  • CompletionMode equals to NonIndexed if parallelism is 1 else Indexed
  • Completions equals to parallelism
  • Parallelism equals to parallelism

The following scenarios are still not supported:

  1. Run $m$ different tasks on $n$ nodes, where $m \neq n$
  2. Run $n$ identical workers taking tasks from an external queue

Out of curiosity, can you share a bit more of information on your use case? I guess that you may want to restrict the parallelism so running a massively parallel task doesn't exhaust your cluster's resources. 🤔

Footnotes

  1. 🔔 @iterative/cml for sanity

@sjawhar
Copy link
Contributor Author

sjawhar commented Jun 4, 2022

CompletionMode equals to NonIndexed if parallelism is 1 else Indexed

I think you still might not be grokking the purpose of non/indexed. They're both valid modes of parallel operation, it's not about parallelism = 1 or not.

Out of curiosity, can you share a bit more of information on your use case? I guess that you may want to restrict the parallelism so running a massively parallel task doesn't exhaust your cluster's resources. 🤔

You got it. If I need to run 100 things, I don't want to spawn 100 pods. This seems to me like a necessary consideration for supporting on-prem/k8s.

As you can see from the diff, this is a very minor change that opens up a lot of functionality in k8s. I'd be grateful if you'd reconsider the pros/cons here. 🙏

@0x2b3bfa0
Copy link
Member

If I need to run 100 things, I don't want to spawn 100 pods

What you propose is similar to jobs.<job_id>.strategy.max-parallel on GitHub Actions and could be useful, although supporting it outside k8s is significantly more complex.

As per your suggestion, the current parallelism argument should be replaced by two different arguments:

  1. Number of times to run a task (i.e. completions in the Kubernetes parlance)
  2. Maximum number of concurrent workers (i.e. parallelism in the Kubernetes parlance)

By default, 2 should be equal to 1 if not specified by users.

@0x2b3bfa0
Copy link
Member

0x2b3bfa0 commented Jun 5, 2022

I think you still might not be grokking the purpose of non/indexed. They're both valid modes of parallel operation, it's not about parallelism = 1 or not.

Sorry, I explained myself badly. 🙈

Traditionally, Kubernetes Jobs didn't have an Indexed mode, and were meant to run exactly the same code one or more times. Using an external task queue was the officially recommended way of running different tasks.

This project tries to simplify the user's workflow as much as possible: the Indexed mode eliminates the need for an external task queue in those cases when the total number of tasks is known in advance, but still allows users to differentiate every index.

My insistence of using NonIndexed mode when parallelism is 1 is just to support Kubernetes versions prior 1.24 if users don't need this feature. Requiring a single completion renders this feature useless, so we can safely disable it; when requiring more completions, it will be always enabled.

@omesser
Copy link
Contributor

omesser commented Jun 9, 2022

@iterative/cml
So, assuming:
CompletionsMode can be inferred implicitly, and going with the latest suggestion.

  1. How do we want to handle something like (user provides k8s specific field when not using k8s):
resource "iterative_task" "example" {
  cloud     = "aws"
  ...
  parallelism = 3
  completions = 5
 }

Error: "completions not supported for non-k8s run-time"? (I am allergic to referring to k8s as a cloud provider 😓 )
We definitely don't have to support a task-queue like logic at this point IMO. We're stepping dangerously into "implementing scheduler" territory.

  1. Considering the above, I have to point out the very likely fact that more and more k8s-specific switches will probably be requested and added in the future, and maybe some cloud-provider specific ones as well - some that it won't be natural to abstract away or support across providers neatly.
    My personal opinion is that we need to keep those as true as possible to the original behavior (=same name, some provider/run-time specific block? maybe nested under "advanced"?) so users can easily infer their behavior from the original docs without inventing ad-hoc abstractions for them that aren't natural. Thoughts on this?

@casperdcl
Copy link
Contributor

casperdcl commented Jun 10, 2022

introducing a k8s-specific config block makes sense, esp. since it's not-a-cloud:

resource "iterative_task" "example" {
  cloud       = "k8s"
  parallelism = ... # warn iff defined on `cloud = "k8s"`?

  # k8s-specific block
  k8s {
    parallelism = 3 # in the k8s sense, not in the TPI sense
    completions = 5
  }
}

@0x2b3bfa0
Copy link
Member

I'd avoid introducing backend-specific details, especially when the outer parallelism is functionally equivalent to the inner completions and the inner parallelism has no equivalent in other backends.

@0x2b3bfa0
Copy link
Member

@sjawhar, would it be possible to keep the original scope of this pull request (enable indexed mode when Kubernetes completions > 1) and open a separate pull request for the addition of a separate parallelism limit?

@sjawhar
Copy link
Contributor Author

sjawhar commented Jun 13, 2022

@sjawhar, would it be possible to keep the original scope of this pull request (enable indexed mode when Kubernetes completions > 1) and open a separate pull request for the addition of a separate parallelism limit?

I really don't think that's a good idea. Users should not be forced to use indexed mode. I went that route initially just to make it work for my use case as quickly as possible, but this would be both a change from the current behavior and unnecessarily limiting. Are you sure you want to do that?

@0x2b3bfa0
Copy link
Member

0x2b3bfa0 commented Jun 13, 2022

this would be both a change from the current behavior and unnecessarily limiting

I thought that enabling the indexed mode doesn't have any significant effect beyond exposing a couple of additional environment variables. Which kind of unnecessary limitations do you foresee?

@sjawhar
Copy link
Contributor Author

sjawhar commented Jun 14, 2022

I thought that enabling the indexed mode doesn't have any significant effect beyond exposing a couple of additional environment variables. Which kind of unnecessary limitations do you foresee?

It's not just about environment variables. From the docs:

  • NonIndexed (default): the Job is considered complete when there have been .spec.completions successfully completed Pods. In other words, each Pod completion is homologous to each other.
  • Indexed: the Pods of a Job get an associated completion index from 0 to .spec.completions-1. The Job is considered complete when there is one successfully completed Pod for each index.

These are two very different behaviors. Your suggestion work perfectly well for my use case (Indexed is what I need), but it seems unnecessarily limiting. The spec already has provider-specific flags (e.g. region does not apply to k8s), so we're not breaking new ground by adding another one.

Nonetheless, if you're sure then I'll make the change.

@0x2b3bfa0
Copy link
Member

Your suggestion work perfectly well for my use case (Indexed is what I need), but it seems unnecessarily limiting.

Indexed mode is precisely the kind of behavior this tool is meant to have; see #585. Unless there is any practical use case that requires the NonIndexed mode, it looks like there is no pressing need to support both.

The spec already has provider-specific flags (e.g. region does not apply to k8s), so we're not breaking new ground by adding another one.

Ouch! 😅 Indeed, although there were plans to “rename region to location so it's valid for zones and node selectors” as per #412 (comment)

@omesser
Copy link
Contributor

omesser commented Jun 19, 2022

@0x2b3bfa0

Indexed mode is precisely the kind of behavior this tool is meant to have; see #585. Unless there is any practical use case that requires the NonIndexed mode, it looks like there is no pressing need to support both.

I'm with @sjawhar on this one - it makes no sense taking away this choice for the perceived "simplicity" of merely saving a field here (1:1 with k8s field, explainability & docs built in). If we will get here the power of NonIndexed for the case of k8s only - that's still valuable IMO. No need to abuse terminology or over generalize - this should be in a k8s-specific block imo (yes, exposing backend specific details), no need to try and provide same functionality/choice across all other clouds/engines - that would be a broken abstraction.
The key to keep our code powerful and still simple, meaningful and maintainable, IMO, is this - once something is not easily/directly generalizable - keep it provider specific, help the user transition to k8s terminology

"rename region to location so it's valid for zones and node selectors"

I'd advise against 😨 (commented on #412 )

@0x2b3bfa0
Copy link
Member

the power of NonIndexed

Citation needed. 😅 If we provide “powerful” options it should be for a reason, methinks.

@0x2b3bfa0
Copy link
Member

this should be in a k8s-specific block imo (yes, exposing backend specific details), no need to try and provide same functionality/choice across all other clouds/engines - that would be a broken abstraction.

We may end up exposing backend-specific details when shoehorning them under a common specification is proven to be difficult or impractical. 👍🏼

Although I wonder if this is the case.

@0x2b3bfa0
Copy link
Member

0x2b3bfa0 commented Jun 19, 2022

The key to keep our code powerful and still simple, meaningful and maintainable, IMO, is this - once something is not easily/directly generalizable - keep it provider specific, help the user transition to k8s terminology

When terminology is [susceptible of being] generalizable and different backends overload the same names with different meanings, I believe that it makes sense to run the extra mile and “generalize” on our side a bit more.

I'd argue that this particular functionality should be generalized, although in some other cases (e.g. placement) we end up resorting to specific options.

@0x2b3bfa0
Copy link
Member

taking away this choice for the perceived "simplicity" of merely saving a field here

Taking away some choices isn't implicitly negative; quite the opposite. Unless we find any supported use case that benefits particularly from NonIndexed mode, I'd consider not exposing that choice to users.

@omesser
Copy link
Contributor

omesser commented Jun 23, 2022

We discussed this offline a bit, and specifically - I find it hard to defend the choice for NoneIndexed, except for the religious reason of "give me ma k8s options!"
So I would say we can live without it for now

@sjawhar
Copy link
Contributor Author

sjawhar commented Jun 23, 2022

Ok, I'll make the changes sometime in the next few days

@redabuspatrol
Copy link

I came across this PR while looking for this feature with AWS EC2. I think the ability to operate parallel instances with regular cloud providers and have some sort of indexing, or any mechanism, to dispatch work to the different instances can greatly help small teams and individual developers who don't have resources to manage k8s.

@0x2b3bfa0
Copy link
Member

@redabuspatrol, can you please comment on #585? In the meanwhile, you may also want to try having a separate task for every chunk of work.

@sjawhar sjawhar temporarily deployed to manual August 16, 2022 20:42 Inactive
@sjawhar
Copy link
Contributor Author

sjawhar commented Aug 16, 2022

Sorry for the long delay. I've reverted most of the changes, and implemented @0x2b3bfa0 's preferred solution.

@sjawhar sjawhar temporarily deployed to automatic August 16, 2022 21:35 Inactive
@sjawhar sjawhar temporarily deployed to automatic August 16, 2022 21:35 Inactive
@sjawhar sjawhar temporarily deployed to automatic August 16, 2022 21:35 Inactive
@0x2b3bfa0 0x2b3bfa0 merged commit bfca950 into iterative:master Aug 17, 2022
@0x2b3bfa0
Copy link
Member

Thank you very much, @sjawhar! 🙏🏼

@casperdcl
Copy link
Contributor

Thanks so much @sjawhar! Sorry for the delay and (perhaps excessive) hesitancy from our end too.

The reverted changes still look worth pursuing to us, just in separate PRs. We'd probably also be much quicker at understanding & merging said PRs if you could share an example of their use too... I assume related to things like master...sjawhar:terraform-provider-iterative:feature/nfs-volume and iterative/dvc@main...sjawhar:dvc:feature/parallel-repro?

@sjawhar sjawhar deleted the feature/k8s-completion-mode branch August 20, 2022 16:07
@sjawhar
Copy link
Contributor Author

sjawhar commented Aug 20, 2022

I can open separate issues for each of the following features

From the links above, you'll see I've already implemented all of these in the feature/nfs-volume branch, which I'm using for a project at work. Would be great to get all of them upstream, and I think the only sticking point is how to represent cloud-specific options in Terraform file.

@sjawhar
Copy link
Contributor Author

sjawhar commented Sep 5, 2022

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
external-request You asked, we did
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants