Default interfaces and other things #9
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR has a few changes, but the important ones are:
Default worker interface
We now try to automatically pick the fastest interface for the worker to listen on. This is done by using a minimal version of NetworkInterfaceControllers.jl (thanks @JBlaschke ❤️) and a Linux-only check for the negotiated link speed. In the future we could add support for passing an interface name/CIDR blocks/interface type to
addprocs()
for explicit selection.Unfortunately testing the functionality is a bit tricky, for now there's just a sanity test in the tests and I've tested it manually on two clusters with both ethernet and infiniband networks.
Passing environment variables
addprocs(::SSHManager)
now passes allJULIA_*
environment variables by default. I found out we didn't do this already when I added a worker on a node with a different microarchitecture and saw segfaults on the local node from the pkgimages being compiled without all the archs inJULIA_CPU_TARGET
... For context, these are the environment variables our Julia module sets by default:All of them are important for a good out-of-the-box experience. On the other hand, admittedly this change is very much biased towards adding workers on a cluster with a shared filesystem. Personally I think it's a good default, but alternatively we could support a
JULIA_DISTRIBUTEDNEXT_PASS_ALL_VARS
env var for admins to set that will enable/disable passing everything by default.