Skip to content

High computational complexity in update_task_state() #272

@panos-gkonis

Description

@panos-gkonis

Hello,

We've seen severe performance issues with some of our (larger) workflows, that we've tracked down to an expensive call to nx.simple_cycles() whenever a state transition happens: A task finishes, then the workflow engine takes in the order of tens of seconds (CPU bound) to process the next task to schedule.

update_task_state() is called whenever a task ends and the next one(s) need(s) to be decided. This makes a call to _evaluate_route() which then checks self.graph.in_cycle(task_id). The latter is a list comprehension around simple_cycles() from the networkx lib.
The complexity of that function is linear in terms of the number of simple cycles in the graph.

Unfortunately, for graphs with large loops, this number can be quite significant. In the following example (similar to some real-world workflows), it scales exponentially (against the number of tasks):

version: 1.0

input:
  - can_do_task_a
  - can_do_task_b
  - can_do_task_c

tasks:
  main:
    action: core.noop
    next:
      - do: loop_begin
  loop_begin:
    action: core.noop
    next:
      - do: task_a
  task_a:
    action: core.noop
    next:
      - do: task_a_impl
        when: '{{ ctx("can_do_task_a") }}'
      - do: task_b
        when: '{{ not ctx("can_do_task_a") }}'
  task_a_impl:
    action: core.noop # real action goes here
    next:
      - do: task_b
        when: '{{ succeeded() }}'
  task_b:
    action: core.noop
    next:
      - do: task_b_impl
        when: '{{ ctx("can_do_task_b") }}'
      - do: task_c
        when: '{{ not ctx("can_do_task_b") }}'
  task_b_impl:
    action: core.noop # real action goes here
    next:
      - do: task_c
        when: '{{ succeeded() }}'
  task_c:
    action: core.noop
    next:
      - do: task_c_impl
        when: '{{ ctx("can_do_task_c") }}'
      - do: tasks_done
        when: '{{ not ctx("can_do_task_c") }}'
  task_c_impl:
    action: core.noop # real action goes here
    next:
      - do: tasks_done
        when: '{{ succeeded() }}'
  tasks_done:
    action: core.noop # real action goes here
    next:
      - do: loop_begin
        when: '{{ failed() }}'

There is a small optimization that can be made (I will submit PR #273 ), but it's a band-aid solution that only works in some cases.

Is there a way to remove the need for that call entirely?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions