-
Notifications
You must be signed in to change notification settings - Fork 585
Configurable core count #2363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Configurable core count #2363
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2363 +/- ##
==========================================
- Coverage 71.58% 66.60% -4.99%
==========================================
Files 65 64 -1
Lines 36214 34169 -2045
==========================================
- Hits 25923 22757 -3166
- Misses 10291 11412 +1121 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
@@ -430,6 +431,7 @@ impl DistSystem { | |||
Some(SocketAddr::from(([0, 0, 0, 0], server_addr.port()))), | |||
self.scheduler_url().to_url(), | |||
token, | |||
4, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be great if the test could verify that, indeed, 4 core are being used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess so, but that would add a lot more work than the actual change - and you are seeing some of the first Rust code that I've written here, so I really don't know the tradeoffs etc to make a good decision about how to pull it off.
I have some ideas about scheduling improvements, though absolutely no guarantees that I'll get to them. That would need a corresponding test harness (maybe with a kind of mock compiler that takes a configurable amount of CPU time and memory and writes some kind of tracing output) with one of the more trivial checks being that the max number of jobs is not exceeded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean actually verify, not jusk asking the scheduler what it thinks its limit is, which seems a bit pointless because it "obviously works" (right?)
76945e4
to
5934244
Compare
5934244
to
49036b7
Compare
49036b7
to
bd88a2b
Compare
bd88a2b
to
515168d
Compare
It seems to be a bad deal: increases line count and obscures the origin of values in a pretty long function.
515168d
to
5e15d3f
Compare
Also move the slight inflation of CPU core count ("overcommit" to make up for various latencies) to the builder in order to enable setting an exact maximum number of cores to use which will never be exceeded. That introduces a small problem in the scheduling protocol (excess overcommit if the builder is new and the scheduler is old) that seems pretty acceptable to me and, anyway, does not occur if both builder and scheduler are of the same version. As another side effect, it shouldn't occur anymore that the scheduler reports more running jobs than available slots.
5e15d3f
to
3d47eb4
Compare
Some nodes can run out of memory if all of their cores are used. There may be other reasons to limit the amount of cores used.