Skip to content

Configurable core count #2363

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ahartmetz
Copy link
Contributor

Some nodes can run out of memory if all of their cores are used. There may be other reasons to limit the amount of cores used.

@codecov-commenter
Copy link

codecov-commenter commented Mar 14, 2025

Codecov Report

Attention: Patch coverage is 2.94118% with 33 lines in your changes missing coverage. Please review.

Project coverage is 66.60%. Comparing base (5b1e93e) to head (3d47eb4).

Files with missing lines Patch % Lines
src/dist/http.rs 0.00% 20 Missing ⚠️
src/bin/sccache-dist/main.rs 0.00% 11 Missing ⚠️
tests/harness/mod.rs 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2363      +/-   ##
==========================================
- Coverage   71.58%   66.60%   -4.99%     
==========================================
  Files          65       64       -1     
  Lines       36214    34169    -2045     
==========================================
- Hits        25923    22757    -3166     
- Misses      10291    11412    +1121     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@@ -430,6 +431,7 @@ impl DistSystem {
Some(SocketAddr::from(([0, 0, 0, 0], server_addr.port()))),
self.scheduler_url().to_url(),
token,
4,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be great if the test could verify that, indeed, 4 core are being used

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess so, but that would add a lot more work than the actual change - and you are seeing some of the first Rust code that I've written here, so I really don't know the tradeoffs etc to make a good decision about how to pull it off.
I have some ideas about scheduling improvements, though absolutely no guarantees that I'll get to them. That would need a corresponding test harness (maybe with a kind of mock compiler that takes a configurable amount of CPU time and memory and writes some kind of tracing output) with one of the more trivial checks being that the max number of jobs is not exceeded.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean actually verify, not jusk asking the scheduler what it thinks its limit is, which seems a bit pointless because it "obviously works" (right?)

@ahartmetz ahartmetz force-pushed the configurable_core_count branch from 76945e4 to 5934244 Compare June 17, 2025 13:43
@ahartmetz ahartmetz marked this pull request as ready for review June 17, 2025 13:44
@ahartmetz ahartmetz force-pushed the configurable_core_count branch from 5934244 to 49036b7 Compare June 17, 2025 13:45
@sylvestre sylvestre force-pushed the configurable_core_count branch from 49036b7 to bd88a2b Compare June 20, 2025 15:48
@ahartmetz ahartmetz force-pushed the configurable_core_count branch from bd88a2b to 515168d Compare June 24, 2025 15:51
It seems to be a bad deal: increases line count and obscures the
origin of values in a pretty long function.
@ahartmetz ahartmetz force-pushed the configurable_core_count branch from 515168d to 5e15d3f Compare July 7, 2025 19:26
Also move the slight inflation of CPU core count ("overcommit" to
make up for various latencies) to the builder in order to enable
setting an exact maximum number of cores to use which will never be
exceeded. That introduces a small problem in the scheduling
protocol (excess overcommit if the builder is new and the scheduler
is old) that seems pretty acceptable to me and, anyway, does not
occur if both builder and scheduler are of the same version.
As another side effect, it shouldn't occur anymore that the
scheduler reports more running jobs than available slots.
@ahartmetz ahartmetz force-pushed the configurable_core_count branch from 5e15d3f to 3d47eb4 Compare July 7, 2025 19:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants