RFC: Add task hooks for create/switch/finish events #49083
Draft
+90
−16
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Explanation adapted from #39994:
Certain libraries are configured using global or thread-local state instead of passing handles to every function. CUDA, for example, has a
cudaSetDevice
function that binds a device to the current thread for all future API calls. This is at odds with Julia's task-based concurrency, which presents an execution environment that's local to the current task (e.g., in the case of CUDA, using a different device).This PR adds a hook mechanism that can be used to detect task creation, task switches, and task finish, allowing synchronization of Julia's task-local environment with a library's global or thread-local state.
Further information
When writing code which interfaces with libraries, OS interfaces, or Julia's runtime itself, certain data and operations are intended to operate thread-locally or globally. For example:
Base.gc_num()
is a global set of countersThis becomes an issue when integrating such APIs with Julia's task system, because task switches cause a change in which piece of code is executing in the middle of a given task's lifetime. That code might expect that a certain thread-local or global value, which was previously set or measured, is still at it's original value; obviously, we can't currently make this guarantee.
We've had a variety of workarounds in place for this; in the GPU case, we can prefix every GPU API call with thread-local context setup, and make sure that no yields happen in-between that and the API call itself. This works, but is slow, cumbersome, and error-prone. But in the case of measuring task execution time or allocations, we have no way to guarantee that we're actually measuring the metrics for the task we care about.
The solution proposed here is simple and straightforward: allow a restricted set of arbitrary user code to run during task switches. Such code is called a "hook", and each registered hook executes when a task (which contains the hooks) is switched from or to some other task. By doing this, hooks can ensure that changes in the task-to-thread relationship and current running state of a task can be adequately accounted for, when necessary. Additionally, hooks are triggered at task creation time to allow "inheriting" behavior or values from the parent task; at task start for any initialization at first run; and at task finish for any cleanup.
Unlike the implementation in #39994, this PR adds hooks on a per-task basis. This prevents hooks from running when they aren't needed; hook code is only run when a hook is registered to a given task. This makes it easy to add and remove hooks safely, because a task's hooks can and should only be modified by the task itself (or the parent task during task creation, when the task is stopped), so expensive synchronization is not necessary.
The hooks here run in 5 places:
Hooks are passed an event code (currently just the numbers 0 through 4), and if the hook is running during task creation, the child task is also passed as the second argument (otherwise it is
nothing
). Hooks are treated similarly to finalizer callbacks, in that they are not allowed toyield
.Overhead
Some basic benchmark results suggest that a task without hooks registered has almost no measurable overhead during task switches: https://gist.github.com/jpsamaroo/7da2124ce306ad6ae3e2588368bbdcda.
Preemption
It was pointed out that task hooks might make future implementation of non-cooperative (preemptive) multitasking harder, because code may be interrupted outside of
yield
calls without calling the hooks. However, preemptive multitasking might itself break lots of code which is relying on code between yield points being non-interruptible, and so some kind of way to disable preemption for regions of code will likely be necessary. Additionally, task hooks could potentially be marked as "preemption-safe" and thus run during preemption; if any task hook isn't marked as preemption-safe, then that task should not be preemptible.Use cases
logstate
field with task-local storageTodo: