Thread-Aware micro_yield(thread_id)
#31
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This adds a
worker_yield_twith an operatoroperator()(thread_index_t thread_idx)which is only called by worker threads that are waiting defaults tomicro_yieldbut can be customized by the user.Right now,
fork_unionoffers two mechanisms to control the behaviour of threads that do not have work to do:sleep(microseconds)to tell the threads assume achill_kmood_tand tostd::this_thread::sleep_for(microseconds)in a loop while waiting for a new job.micro_yield()which callsoperator()of a user-definablemicro_yield_t.Since we are writing an application that is frequently used on consumer hardware alongside a DAW, multiple VST plugins and a host of other random applications, relying on
micro_yield()alone is not an option; it can use power optimized instructions but still results in 100% CPU usage.sleep()is much better for our usecase except for the added latency. Since our application is doing real-time audio, the added latency worsens performance since a slow wakeup can cause a missed deadline and generally hurts performance.One possible way to solve that dilemma is to wait for new jobs by doing increasing amount of
micro_yield()before finally calling astd::this_thread::sleep_for(microseconds). This form of tiered waiting, augmented by some sort of backoff measure is described in this article and seems to be more or less common.With
fork_unionas it is right now, I don't think we can implement that form of waiting for our application.It would be possible to implement this kind of waiting scheme inside the worker's lambda using
unsafe_for_threadsand tying the wait procedure to the acquiring of a mutex. In pseudo-code it would look like this:This would work (maybe with some performance issue due to contention on the lock) if we never needed to use the full result of the computation and we could live with potentially partial results until the next call to
process.The problem is that we call
fork_union'sfor_nAPI inside aprocessfunction which is called by the OS's audio interrupt handling and we need to do some single-threaded processing on the result of the multi-threaded computation, so we need to join at the end of process. This means the threads are forced to be in-between jobs between the audio interrupt calls, callingmicro_yieldorsleepin a loop.We could define a
micro_yield_tto implement the waiting behaviour and to usestd::this_thread::get_idand counters to track which stage of the wait each thread is on.I think
std::this_thread::get_idwould make it hard to implement a very performant solution because we would need to use some kind of map data structure. Furthermore,fork_unionusesmicro_yield()outside of_worker_loopto implement various waits which makes interactions withstd::this_thread::get_idharder to implement correctly.This is why I propose to include a
worker_yield_twith anoperator()(thread_index_t thread_idx)operator that defaults to callingmicro_yield()but is guaranteed to only be called in_worker_loopbetween jobs.Downsides:
size_tparameter anytime a wait needs to happen ? I don't know what the performance impact is but I'm guessing it must be minimal.linux_colocated_poolsince only the global thread seems to sleep ?worker_yield_t(micro_yield_t) is instanciated in the_worker_loopfunction, the user needs to rely on static members to reset wait attempt count. Maybe pre-instanciate aworker_yield_tin abasic_pooland make it accessible but this makes the API messier.void() resetmember toworker_yield_tthat could be called internally at every epoch change. This is a possibly more constraining API but it would work for our use-case.I have tested these changes for our use-case and the results are pretty good but I'm open to any changes/suggestion if it means an API that enables us to overcome the challenges outlined above gets into main !