Skip to content

Potential CPU oversubscription in BeaconProcessor due to unscoped rayon usage #7719

@jimmygchen

Description

@jimmygchen

Currently, BeaconProcessor is initialised with max_workers set to the number of available CPUs:

impl Default for BeaconProcessorConfig {
fn default() -> Self {
Self {
max_workers: cmp::max(1, num_cpus::get()),

Each task spawned by the processor consumes one worker. However, some tasks (e.g. Work::ColumnReconstruction) internally use rayon to parallelise computation. By default, rayon uses its global thread pool, which is also sized to the number of CPUs.

This likely results in CPU oversubscription: both Lighthouse and rayon independently assume full CPU availability, leading to degraded performance under load.

Proposed Solution

Consider allowing BeaconProcessor to allocate multiple workers to expensive tasks. When spawning such a task:

  • Wait until n workers are free (e.g. 4),
  • Spawn the task with a scoped rayon thread pool limited to those n threads,
  • Release the workers once the task completes.

This would preserve control over CPU usage while still enabling parallelism within heavy tasks.

Additional Details

Discussion below with @michaelsproul from the experimental PR #7720:

BeaconProcessor

Async task

  • Keep computation light
  • Any blocking computation go through the beacon processor
  • Not bounded by max workers (num_cpus) but probably still need a queuing system to make sure it doesn't go unbounded and cause memory issues

Blocking task

  • Gets allocated 1 thread by default
  • For heavy tasks that requires rayon, allowWorkTypes to acquire more than 1 worker (N) - when executing, create a scoped rayon pool and run the parallel tasks within the scope, and release the workers after it completes

CPU allocation

  • tokio runtime: num_cpus thread (default)
  • BeaconProcessor: running inside the tokio runtime, and maintains blocking workers max_blocking_workers = num_cpus
  • rayon: scoped per task

Memory

  • Both sync and async tasks are bounded by queues of each individual work type.

Metadata

Metadata

Assignees

Labels

hardeningoptimizationSomething to make Lighthouse run more efficiently.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions