Skip to content

Commit bb6a48e

Browse files
authored
Add SSHManager support for invoking Windows workers via cmd.exe (#38353)
Distributed.addprocs() now supports four new keyword arguments `shell`, `ssh`, `env` and `cmdline_cookie`. Specifying `shell=:wincmd` now makes it possible to start workers on a Windows machine with an sshd server that invokes `cmd.exe` as the shell (e.g. Microsoft's OpenSSH port does that by default). Previously SSHManager only supported ssh connections to a POSIX shell. Specifying `ssh="/usr/bin/ssh"` makes it possible to specify the ssh client that SSHManager will use (useful for debugging and where a custom-version of ssh is required). The new `env` parameter now allows to pass arbitrary environment variables to workers. Specifying `cmdline_cookie=true` is a workaround for an ssh problem with Windows workers that run older (pre-ConPTY) version of Windows, Julia, or OpenSSH.
1 parent 17ea0f8 commit bb6a48e

File tree

4 files changed

+87
-16
lines changed

4 files changed

+87
-16
lines changed

NEWS.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -228,6 +228,9 @@ Standard library changes
228228

229229
#### Distributed
230230

231+
* Now supports invoking Windows workers via ssh (via new keyword argument `shell=:wincmd` in `addprocs`) ([#30614])
232+
233+
* Other new keyword arguments in `addprocs`: `ssh` to specify the ssh client path, `env` to pass environment variables to workers, and `cmdline_cookie` to work around an ssh problem with Windows workers that run older (pre-ConPTY) versions of Windows, Julia or OpenSSH. ([#30614])
231234

232235
#### UUIDs
233236

stdlib/Distributed/src/Distributed.jl

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,8 @@ import Base: getindex, wait, put!, take!, fetch, isready, push!, length,
1313
using Base: Process, Semaphore, JLOptions, buffer_writes, @sync_add,
1414
VERSION_STRING, binding_module, atexit, julia_exename,
1515
julia_cmd, AsyncGenerator, acquire, release, invokelatest,
16-
shell_escape_posixly, uv_error, something, notnothing, isbuffered,
17-
mapany
16+
shell_escape_posixly, shell_escape_wincmd, escape_microsoft_c_args,
17+
uv_error, something, notnothing, isbuffered, mapany
1818
using Base.Threads: Event
1919

2020
using Serialization, Sockets

stdlib/Distributed/src/cluster.jl

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ abstract type ClusterManager end
1515
Type used by [`ClusterManager`](@ref)s to control workers added to their clusters. Some fields
1616
are used by all cluster managers to access a host:
1717
* `io` -- the connection used to access the worker (a subtype of `IO` or `Nothing`)
18-
* `host` -- the host address (either an `AbstractString` or `Nothing`)
18+
* `host` -- the host address (either a `String` or `Nothing`)
1919
* `port` -- the port on the host used to connect to the worker (either an `Int` or `Nothing`)
2020
2121
Some are used by the cluster manager to add workers to an already-initialized host:
@@ -515,6 +515,10 @@ end
515515

516516
default_addprocs_params() = Dict{Symbol,Any}(
517517
:topology => :all_to_all,
518+
:ssh => "ssh",
519+
:shell => :posix,
520+
:cmdline_cookie => false,
521+
:env => [],
518522
:dir => pwd(),
519523
:exename => joinpath(Sys.BINDIR::String, julia_exename()),
520524
:exeflags => ``,

stdlib/Distributed/src/managers.jl

Lines changed: 77 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -72,11 +72,20 @@ Keyword arguments:
7272
7373
* `multiplex`: if `true` then SSH multiplexing is used for SSH tunneling. Default is `false`.
7474
75+
* `ssh`: the name or path of the SSH client executable used to start the workers.
76+
Default is `"ssh"`.
77+
7578
* `sshflags`: specifies additional ssh options, e.g. ```sshflags=\`-i /home/foo/bar.pem\````
7679
7780
* `max_parallel`: specifies the maximum number of workers connected to in parallel at a
7881
host. Defaults to 10.
7982
83+
* `shell`: specifies the type of shell to which ssh connects on the workers.
84+
85+
+ `shell=:posix`: a POSIX-compatible Unix/Linux shell (bash, sh, etc.). The default.
86+
87+
+ `shell=:wincmd`: Microsoft Windows `cmd.exe`.
88+
8089
* `dir`: specifies the working directory on the workers. Defaults to the host's current
8190
directory (as found by `pwd()`)
8291
@@ -105,8 +114,22 @@ Keyword arguments:
105114
are setup lazily, i.e. they are setup at the first instance of a remote call between
106115
workers. Default is true.
107116
117+
* `env`: provide an array of string pairs such as
118+
`env=["JULIA_DEPOT_PATH"=>"/depot"] to request that environment variables
119+
are set on the remote machine. By default only the environment variable
120+
`JULIA_WORKER_TIMEOUT` is passed automatically from the local to the remote
121+
environment.
122+
123+
* `cmdline_cookie`: pass the authentication cookie via the `--worker` commandline
124+
option. The (more secure) default behaviour of passing the cookie via ssh stdio
125+
may hang with Windows workers that use older (pre-ConPTY) Julia or Windows versions,
126+
in which case `cmdline_cookie=true` offers a work-around.
127+
128+
!!! compat "Julia 1.6"
129+
The keyword arguments `ssh`, `shell`, `env` and `cmdline_cookie`
130+
were added in Julia 1.6.
108131
109-
Environment variables :
132+
Environment variables:
110133
111134
If the master process fails to establish a connection with a newly launched worker within
112135
60.0 seconds, the worker treats it as a fatal situation and terminates.
@@ -184,11 +207,15 @@ function parse_machine(machine::AbstractString)
184207
end
185208

186209
function launch_on_machine(manager::SSHManager, machine::AbstractString, cnt, params::Dict, launched::Array, launch_ntfy::Condition)
210+
shell = params[:shell]
211+
ssh = params[:ssh]
187212
dir = params[:dir]
188213
exename = params[:exename]
189214
exeflags = params[:exeflags]
190215
tunnel = params[:tunnel]
191216
multiplex = params[:multiplex]
217+
cmdline_cookie = params[:cmdline_cookie]
218+
env = Dict{String,String}(params[:env])
192219

193220
# machine could be of the format [user@]host[:port] bind_addr[:bind_port]
194221
# machine format string is split on whitespace
@@ -199,7 +226,11 @@ function launch_on_machine(manager::SSHManager, machine::AbstractString, cnt, pa
199226
if length(machine_bind) > 1
200227
exeflags = `--bind-to $(machine_bind[2]) $exeflags`
201228
end
202-
exeflags = `$exeflags --worker`
229+
if cmdline_cookie
230+
exeflags = `$exeflags --worker=$(cluster_cookie())`
231+
else
232+
exeflags = `$exeflags --worker`
233+
end
203234

204235
host, portnum = parse_machine(machine_bind[1])
205236
portopt = portnum === nothing ? `` : `-p $portnum`
@@ -210,7 +241,7 @@ function launch_on_machine(manager::SSHManager, machine::AbstractString, cnt, pa
210241
# If it's already running, later ssh sessions also use the same ssh multiplexing session even if
211242
# `multiplex` is not explicitly specified; otherwise the tunneling session launched later won't
212243
# go to background and hang. This is because of OpenSSH implementation.
213-
if success(`ssh $sshflags -O check $host`)
244+
if success(`$ssh $sshflags -O check $host`)
214245
multiplex = true
215246
elseif multiplex
216247
# automatically create an SSH multiplexing session at the next SSH connection
@@ -221,33 +252,66 @@ function launch_on_machine(manager::SSHManager, machine::AbstractString, cnt, pa
221252

222253
# Build up the ssh command
223254

224-
# the default worker timeout
225-
tval = get(ENV, "JULIA_WORKER_TIMEOUT", "")
255+
# pass on some environment variables by default
256+
for var in ["JULIA_WORKER_TIMEOUT"]
257+
if !haskey(env, var) && haskey(ENV, var)
258+
env[var] = ENV[var]
259+
end
260+
end
261+
for var in keys(ENV)
262+
occursin(r"^[a-zA-Z0-9_]+$", var) || throw(ArgumentError(var))
263+
end
226264

227265
# Julia process with passed in command line flag arguments
228-
cmds = """
229-
cd -- $(shell_escape_posixly(dir))
230-
$(isempty(tval) ? "" : "export JULIA_WORKER_TIMEOUT=$(shell_escape_posixly(tval))")
231-
$(shell_escape_posixly(exename)) $(shell_escape_posixly(exeflags))"""
266+
if shell == :posix
267+
# ssh connects to a POSIX shell
232268

233-
# shell login (-l) with string command (-c) to launch julia process
234-
cmd = `sh -l -c $cmds`
269+
cmds = "$(shell_escape_posixly(exename)) $(shell_escape_posixly(exeflags))"
270+
# set environment variables
271+
for (var, val) in env
272+
cmds = "export $(var)=$(shell_escape_posixly(val))\n$cmds"
273+
end
274+
# change working directory
275+
cmds = "cd -- $(shell_escape_posixly(dir))\n$cmds"
276+
277+
# shell login (-l) with string command (-c) to launch julia process
278+
remotecmd = shell_escape_posixly(`sh -l -c $cmds`)
279+
280+
elseif shell == :wincmd
281+
# ssh connects to Windows cmd.exe
282+
283+
any(c -> c == '"', exename) && throw(ArgumentError("invalid exename"))
284+
285+
remotecmd = shell_escape_wincmd(escape_microsoft_c_args(exename, exeflags...))
286+
# change working directory
287+
if dir !== nothing && dir != ""
288+
any(c -> c == '"', dir) && throw(ArgumentError("invalid dir"))
289+
remotecmd = "pushd \"$(dir)\" && $remotecmd"
290+
end
291+
# set environment variables
292+
for (var, val) in env
293+
remotecmd = "set $(var)=$(shell_escape_wincmd(val))&& $remotecmd"
294+
end
295+
296+
else
297+
throw(ArgumentError("invalid shell"))
298+
end
235299

236300
# remote launch with ssh with given ssh flags / host / port information
237301
# -T → disable pseudo-terminal allocation
238302
# -a → disable forwarding of auth agent connection
239303
# -x → disable X11 forwarding
240304
# -o ClearAllForwardings → option if forwarding connections and
241305
# forwarded connections are causing collisions
242-
cmd = `ssh -T -a -x -o ClearAllForwardings=yes $sshflags $host $(shell_escape_posixly(cmd))`
306+
cmd = `$ssh -T -a -x -o ClearAllForwardings=yes $sshflags $host $remotecmd`
243307

244308
# launch the remote Julia process
245309

246310
# detach launches the command in a new process group, allowing it to outlive
247311
# the initial julia process (Ctrl-C and teardown methods are handled through messages)
248312
# for the launched processes.
249313
io = open(detach(cmd), "r+")
250-
write_cookie(io)
314+
cmdline_cookie || write_cookie(io)
251315

252316
wconfig = WorkerConfig()
253317
wconfig.io = io.out

0 commit comments

Comments
 (0)