-
Notifications
You must be signed in to change notification settings - Fork 890
Description
Currently, BeaconProcessor
is initialised with max_workers
set to the number of available CPUs:
lighthouse/beacon_node/beacon_processor/src/lib.rs
Lines 251 to 254 in f67084a
impl Default for BeaconProcessorConfig { | |
fn default() -> Self { | |
Self { | |
max_workers: cmp::max(1, num_cpus::get()), |
Each task spawned by the processor consumes one worker. However, some tasks (e.g. Work::ColumnReconstruction
) internally use rayon
to parallelise computation. By default, rayon
uses its global thread pool, which is also sized to the number of CPUs.
This likely results in CPU oversubscription: both Lighthouse and rayon
independently assume full CPU availability, leading to degraded performance under load.
Proposed Solution
Consider allowing BeaconProcessor
to allocate multiple workers to expensive tasks. When spawning such a task:
- Wait until
n
workers are free (e.g. 4), - Spawn the task with a scoped
rayon
thread pool limited to thosen
threads, - Release the workers once the task completes.
This would preserve control over CPU usage while still enabling parallelism within heavy tasks.
Additional Details
Discussion below with @michaelsproul from the experimental PR #7720:
BeaconProcessor
Async task
- Keep computation light
- Any blocking computation go through the beacon processor
- Not bounded by max workers (
num_cpus
) but probably still need a queuing system to make sure it doesn't go unbounded and cause memory issues
Blocking task
- Gets allocated 1 thread by default
- For heavy tasks that requires rayon, allow
WorkType
s to acquire more than 1 worker (N
) - when executing, create a scopedrayon
pool and run the parallel tasks within the scope, and release the workers after it completes
CPU allocation
- tokio runtime:
num_cpus
thread (default) BeaconProcessor
: running inside the tokio runtime, and maintains blocking workersmax_blocking_workers = num_cpus
rayon
: scoped per task
Memory
- Both sync and async tasks are bounded by queues of each individual work type.