Still looking at the same region of code, we find that we're spending a lot of runtime in validating that our graph is actually acyclic. More specifically, the `register_future!` function is called from `add_thunk!` to register a `Distributed.Future` that will be filled with the result of the newly-created task once it's done executing, allowing the user to wait on and fetch the task result. This function needs to be somewhat defensive, though, when being called from one task targetting another. If a task tries to register and wait on a future for some other task that is downstream of itself, it will wait forever, because that downstream task won't execute until the task waiting on it completes (thus, a deadlock occurs). Similarly, a task shouldn't be able to wait on itself. To avoid this, `register_future!` checks whether the calling task "dominates" the targetted task; when a task A dominates a task B, that means that the completion of A is necessary before the execution and completion of B can occur. If the calling task dominates the target task, then an error is thrown, preventing accidental deadlock. This check is well-intended, but is also slow; thankfully, when adding tasks with `add_thunk!`, we generally can assume that this new task isn't going to be waited on by a downstream task (it's possible, but a careful developer can trivially avoid it; we shouldn't burden them with unnecessary checks). To alleviate this, I simply added a kwarg to `register_future!` that will by default do the domination check, but can allow it to be manually disabled. For `@spawn`, which implicitly calls `add_thunk!`, we disable the check, because in common usage of that API it's not easy to cause deadlocks[^2]. [This change](https://github.com/JuliaParallel/Dagger.jl/commit/d543c23815de79ae39616045aa1ee285665c014c) gives us the following excellent results:
0 commit comments