-
Notifications
You must be signed in to change notification settings - Fork 39
Description
Hello,
We've seen severe performance issues with some of our (larger) workflows, that we've tracked down to an expensive call to nx.simple_cycles()
whenever a state transition happens: A task finishes, then the workflow engine takes in the order of tens of seconds (CPU bound) to process the next task to schedule.
update_task_state()
is called whenever a task ends and the next one(s) need(s) to be decided. This makes a call to _evaluate_route()
which then checks self.graph.in_cycle(task_id)
. The latter is a list comprehension around simple_cycles()
from the networkx
lib.
The complexity of that function is linear in terms of the number of simple cycles in the graph.
Unfortunately, for graphs with large loops, this number can be quite significant. In the following example (similar to some real-world workflows), it scales exponentially (against the number of tasks):
version: 1.0
input:
- can_do_task_a
- can_do_task_b
- can_do_task_c
tasks:
main:
action: core.noop
next:
- do: loop_begin
loop_begin:
action: core.noop
next:
- do: task_a
task_a:
action: core.noop
next:
- do: task_a_impl
when: '{{ ctx("can_do_task_a") }}'
- do: task_b
when: '{{ not ctx("can_do_task_a") }}'
task_a_impl:
action: core.noop # real action goes here
next:
- do: task_b
when: '{{ succeeded() }}'
task_b:
action: core.noop
next:
- do: task_b_impl
when: '{{ ctx("can_do_task_b") }}'
- do: task_c
when: '{{ not ctx("can_do_task_b") }}'
task_b_impl:
action: core.noop # real action goes here
next:
- do: task_c
when: '{{ succeeded() }}'
task_c:
action: core.noop
next:
- do: task_c_impl
when: '{{ ctx("can_do_task_c") }}'
- do: tasks_done
when: '{{ not ctx("can_do_task_c") }}'
task_c_impl:
action: core.noop # real action goes here
next:
- do: tasks_done
when: '{{ succeeded() }}'
tasks_done:
action: core.noop # real action goes here
next:
- do: loop_begin
when: '{{ failed() }}'
There is a small optimization that can be made (I will submit PR #273 ), but it's a band-aid solution that only works in some cases.
Is there a way to remove the need for that call entirely?