Skip to content

Commit de27fb0

Browse files
Merge pull request #449 from jasonnovichRunAI/v2.14-run-10485
[RUN-10485] adjust --service-type
2 parents a39f128 + f6535cc commit de27fb0

File tree

1 file changed

+81
-67
lines changed

1 file changed

+81
-67
lines changed

docs/Researcher/cli-reference/runai-submit.md

Lines changed: 81 additions & 67 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,11 @@
1-
## Description
1+
# Description
22

33
Submit a Run:ai Job for execution.
44

55
Syntax notes:
66

77
* Flags of type *stringArray* mean that you can add multiple values. You can either separate values with a comma or add the flag twice.
88

9-
109
## Examples
1110

1211
All examples assume a Run:ai Project has been setup using `runai config project <project-name>`.
@@ -40,7 +39,7 @@ Start a Training Job
4039
```console
4140
runai submit --name train1 -i gcr.io/run-ai-demo/quickstart -g 1
4241
```
43-
42+
4443
(see: [training Quickstart](../Walkthroughs/walkthrough-train.md)).
4544

4645
Use GPU Fractions
@@ -84,40 +83,41 @@ Submit a Job without a name with a pre-defined prefix and an incremental index s
8483
runai submit --job-name-prefix -i gcr.io/run-ai-demo/quickstart -g 1
8584
```
8685

87-
8886
## Options
8987

9088
### Job Type
89+
9190
#### --interactive
9291

9392
> Mark this Job as interactive.
9493
95-
#### --jupyter
96-
97-
> Run a Jupyter notebook using a default image and notebook configuration.
98-
9994
### Job Lifecycle
10095

10196
#### --completions < int >
10297

10398
> Number of successful pods required for this job to be completed. Used with HPO.
104-
99+
105100
#### --parallelism < int >
101+
>
106102
> Number of pods to run in parallel at any given time. Used with HPO.
107-
103+
108104
#### --preemptible
105+
>
109106
> Interactive preemptible jobs can be scheduled above guaranteed quota but may be reclaimed at any time.
110107
111108
<!-- Start of common content from snippets/common-submit-cli-commands.md -->
112109
### Naming and Shortcuts
113110

114111
#### --job-name-prefix `<string>`
112+
>
115113
> The prefix to use to automatically generate a Job name with an incremental index. When a Job name is omitted Run:ai will generate a Job name. The optional `--job-name-prefix flag` creates Job names with the provided prefix.
116114
117115
#### --name `<string>`
116+
>
118117
> The name of the Job.
119118
120119
#### --template `<string>`
120+
>
121121
> Load default values from a workload.
122122
123123
### Container Definition
@@ -132,13 +132,13 @@ runai submit --job-name-prefix -i gcr.io/run-ai-demo/quickstart -g 1
132132
133133
#### --attach
134134

135-
> Default is false. If set to true, wait for the Pod to start running. When the pod starts running, attach to the Pod. The flag is equivalent to the command [runai attach](runai-attach.md).
135+
> Default is false. If set to true, wait for the Pod to start running. When the pod starts running, attach to the Pod. The flag is equivalent to the command [runai attach](runai-attach.md).
136136
>
137137
> The --attach flag also sets `--tty` and `--stdin` to true.
138138
139139
#### --command
140140

141-
> Overrides the image's entry point with the command supplied after '--'. When **not** using the `--command` flag, the entry point will **not** be overrided and the string after `--` will be appended as arguments to the entry point command.
141+
> Overrides the image's entry point with the command supplied after '--'. When **not** using the `--command` flag, the entry point will **not** be overrided and the string after `--` will be appended as arguments to the entry point command.
142142
>
143143
> Example:
144144
>
@@ -150,26 +150,26 @@ runai submit --job-name-prefix -i gcr.io/run-ai-demo/quickstart -g 1
150150

151151
> Create a temporary home directory for the user in the container. Data saved in this directory will not be saved when the container exits. For more information see [non root containers](../../admin/runai-setup/config/non-root-containers.md).
152152
153-
#### -e `<stringArray> | --environment `<stringArray>`
153+
#### -e `<stringArray>` | --environment `<stringArray>`
154154

155-
> Define environment variables to be set in the container. To set multiple values add the flag multiple times (`-e BATCH_SIZE=50 -e LEARNING_RATE=0.2`).
155+
> Define environment variables to be set in the container. To set multiple values add the flag multiple times (`-e BATCH_SIZE=50 -e LEARNING_RATE=0.2`).
156156
<!-- or separate by a comma (`-e BATCH_SIZE:50,LEARNING_RATE:0.2`). -->
157157
158158
#### --image `<string>` | -i `<string>`
159159

160-
> Image to use when creating the container for this Job
160+
> Image to use when creating the container for this Job
161161
162162
#### --image-pull-policy `<string>`
163163

164-
> Pulling policy of the image when starting a container. Options are:
164+
> Pulling policy of the image when starting a container. Options are:
165165
>
166-
> - `Always` (default): force image pulling to check whether local image already exists. If the image already exists locally and has the same digest, then the image will not be downloaded.
167-
> - `IfNotPresent`: the image is pulled only if it is not already present locally.
168-
> - `Never`: the image is assumed to exist locally. No attempt is made to pull the image.
166+
> * `Always` (default): force image pulling to check whether local image already exists. If the image already exists locally and has the same digest, then the image will not be downloaded.
167+
> * `IfNotPresent`: the image is pulled only if it is not already present locally.
168+
> * `Never`: the image is assumed to exist locally. No attempt is made to pull the image.
169169
>
170170
> For more information see Kubernetes [documentation](https://kubernetes.io/docs/concepts/configuration/overview/#container-images){target=_blank}.
171171
172-
#### -l | --label `<stringArray>`
172+
#### -l | --label `<stringArray>`
173173

174174
> Set labels variables in the container.
175175
@@ -183,15 +183,15 @@ runai submit --job-name-prefix -i gcr.io/run-ai-demo/quickstart -g 1
183183
184184
#### --stdin
185185

186-
> Keep stdin open for the container(s) in the pod, even if nothing is attached.is attached.
187-
186+
> Keep stdin open for the container(s) in the pod, even if nothing is attached.is attached.
187+
188188
#### -t | --tty
189189

190-
> Allocate a pseudo-TTY.
190+
> Allocate a pseudo-TTY.
191191
192192
#### --working-dir `<string>`
193193

194-
> Starts the container with the specified directory as the current directory.
194+
> Starts the container with the specified directory as the current directory.
195195
196196
### Resource Allocation
197197

@@ -203,7 +203,7 @@ runai submit --job-name-prefix -i gcr.io/run-ai-demo/quickstart -g 1
203203

204204
> Limitations on the number of CPUs consumed by the Job (for example 0.5, 1). The system guarantees that this Job will not be able to consume more than this amount of CPUs.
205205
206-
#### --extended-resource `<stringArray>
206+
#### --extended-resource `<stringArray>`
207207

208208
> Request access to extended resource, syntax `<resource-name> = < resource_quantity >`
209209
@@ -217,11 +217,11 @@ runai submit --job-name-prefix -i gcr.io/run-ai-demo/quickstart -g 1
217217
218218
#### --memory `<string>`
219219

220-
> CPU memory to allocate for this Job (1G, 20M, .etc). The Job will receive **at least** this amount of memory. Note that the Job will **not** be scheduled unless the system can guarantee this amount of memory to the Job.
220+
> CPU memory to allocate for this Job (1G, 20M, .etc). The Job will receive **at least** this amount of memory. Note that the Job will **not** be scheduled unless the system can guarantee this amount of memory to the Job.
221221
222-
#### --memory-limit `<string>
222+
#### --memory-limit `<string>`
223223

224-
> CPU memory to allocate for this Job (1G, 20M, .etc). The system guarantees that this Job will not be able to consume more than this amount of memory. The Job will receive an error when trying to allocate more memory than this limit.
224+
> CPU memory to allocate for this Job (1G, 20M, .etc). The system guarantees that this Job will not be able to consume more than this amount of memory. The Job will receive an error when trying to allocate more memory than this limit.
225225
226226
#### --mig-profile `<string>`
227227

@@ -232,7 +232,7 @@ runai submit --job-name-prefix -i gcr.io/run-ai-demo/quickstart -g 1
232232
#### --backoff-limit `<int>`
233233

234234
> The number of times the Job will be retried before failing. The default is 6. This flag will only work with training workloads (when the `--interactive` flag is not specified).
235-
235+
236236
### Storage
237237

238238
#### --git-sync `<stringArray>`
@@ -264,35 +264,35 @@ runai submit --job-name-prefix -i gcr.io/run-ai-demo/quickstart -g 1
264264
>
265265
> The 2 syntax types of this command are mutually exclusive. You can either use the first or second form, but not a mixture of both.
266266
>
267-
> **Storage_Class_Name** is a storage class name that can be obtained by running `kubectl get storageclasses.storage.k8s.io`. This parameter may be omitted if there is a single storage class in the system, or you are using the default storage class.
267+
> **Storage_Class_Name** is a storage class name that can be obtained by running `kubectl get storageclasses.storage.k8s.io`. This parameter may be omitted if there is a single storage class in the system, or you are using the default storage class.
268268
>
269269
> **Size** is the volume size you want to allocate. See [Kubernetes documentation](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/){target=_blank} for how to specify volume sizes
270270
>
271271
> **Container_Mount_Path**. A path internal to the container where the storage will be mounted
272272
>
273273
> **Pvc_Name**. The name of a pre-existing [Persistent Volume Claim](https://kubernetes.io/docs/concepts/storage/persistent-volumes/#dynamic){target=_blank} to mount into the container
274-
>
274+
>
275275
> Examples:
276276
>
277-
> > `--pvc :3Gi:/tmp/john:ro` - Allocate `3GB` from the default Storage class. Mount it to `/tmp/john` as read-only
277+
> > `--pvc :3Gi:/tmp/john:ro` - Allocate `3GB` from the default Storage class. Mount it to `/tmp/john` as read-only
278278
>
279-
> > `--pvc my-storage:3Gi:/tmp/john:ro` - Allocate `3GB` from the `my-storage` storage class. Mount it to /tmp/john as read-only
279+
> > `--pvc my-storage:3Gi:/tmp/john:ro` - Allocate `3GB` from the `my-storage` storage class. Mount it to /tmp/john as read-only
280280
>
281-
> > `--pvc :3Gi:/tmp/john` - Allocate `3GB` from the default storage class. Mount it to `/tmp/john` as read-write
281+
> > `--pvc :3Gi:/tmp/john` - Allocate `3GB` from the default storage class. Mount it to `/tmp/john` as read-write
282282
>
283-
> > `--pvc my-pvc:/tmp/john` - Use a Persistent Volume Claim named `my-pvc`. Mount it to `/tmp/john` as read-write
283+
> > `--pvc my-pvc:/tmp/john` - Use a Persistent Volume Claim named `my-pvc`. Mount it to `/tmp/john` as read-write
284284
>
285285
> > `--pvc my-pvc-2:/tmp/john:ro` - Use a Persistent Volume Claim named `my-pvc-2`. Mount it to `/tmp/john` as read-only
286286
287287
#### --pvc-exists `<string>`
288288

289289
> Mount a persistent volume. You must include a `claimname` and `path`.
290290
>
291-
> - **claim name**&mdash;The name of the persistent colume claim. Can be obtained by running
291+
> * **claim name**&mdash;The name of the persistent colume claim. Can be obtained by running
292292
>
293293
> `kubectl get storageclasses.storage.k8s.io`
294294
>
295-
> - **path**&mdash;the path internal to the container where the storage will be mounted
295+
> * **path**&mdash;the path internal to the container where the storage will be mounted
296296
>
297297
> Use the format:
298298
>
@@ -302,17 +302,17 @@ runai submit --job-name-prefix -i gcr.io/run-ai-demo/quickstart -g 1
302302

303303
> Mount a persistent volume claim (PVC). If the PVC does not exist, it will be created based on the parameters entered. If a PVC exists, it will be used with its defined attributes and the parameters in the command will be ignored.
304304
>
305-
> - **claim name**&mdash;The name of the persistent colume claim.
306-
> - **storage class**&mdash;A storage class name that can be obtained by running
305+
> * **claim name**&mdash;The name of the persistent colume claim.
306+
> * **storage class**&mdash;A storage class name that can be obtained by running
307307
>
308308
> > `kubectl get storageclasses.storage.k8s.io.`
309309
>
310310
> > `storageclass` may be omitted if there is a single storage class in the system, or you are using the default storage class.
311311
>
312-
> - **size**&mdash;The volume size you want to allocate for the PVC when creating it. See [Kubernetes documentation](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/){target=_blank} to specify volume sizes.
313-
> - **accessmode**&mdash;The description of thr desired volume capabilities for the PVC.
314-
> - **ro**&mdash;Mount the PVC with read-only access.
315-
> - **ephemeral**&mdash;The PVC will be created as volatile temporary storage which is only present during the running lifetime of the job.
312+
> * **size**&mdash;The volume size you want to allocate for the PVC when creating it. See [Kubernetes documentation](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/){target=_blank} to specify volume sizes.
313+
> * **accessmode**&mdash;The description of thr desired volume capabilities for the PVC.
314+
> * **ro**&mdash;Mount the PVC with read-only access.
315+
> * **ephemeral**&mdash;The PVC will be created as volatile temporary storage which is only present during the running lifetime of the job.
316316
>
317317
> Use the format:
318318
>
@@ -321,11 +321,11 @@ runai submit --job-name-prefix -i gcr.io/run-ai-demo/quickstart -g 1
321321
#### --s3 `<string>`
322322

323323
> Mount an S3 compatible storage into the container running the job. The parameter should follow the syntax:
324-
>
324+
>
325325
> `bucket=BUCKET,key=KEY,secret=SECRET,url=URL,target=TARGET_PATH`
326326
>
327327
> All the fields, except url=URL, are mandatory. Default for url is
328-
>
328+
>
329329
> `url=https://s3.amazon.com`
330330
331331
#### -v | --volume 'Source:Container_Mount_Path:[ro]:[nfs-host]'
@@ -334,48 +334,63 @@ runai submit --job-name-prefix -i gcr.io/run-ai-demo/quickstart -g 1
334334
>
335335
> Examples:
336336
>
337-
> `-v /raid/public/john/data:/root/data:ro`
338-
>
337+
> `-v /raid/public/john/data:/root/data:ro`
338+
>
339339
> Mount /root/data to local path /raid/public/john/data for read-only access.
340340
>
341-
> `-v /public/data:/root/data::nfs.example.com`
342-
>
341+
> `-v /public/data:/root/data::nfs.example.com`
342+
>
343343
> Mount /root/data to NFS path /public/data on NFS server nfs.example.com for read-write access.
344344
345345
### Network
346346

347+
<!--
347348
#### --address `<string>`
348349
349350
> Comma separated list of IP addresses to listen to when running with --service-type portforward (default: localhost)
351+
-->
350352

351353
#### --host-ipc
352354

353-
> Use the host's _ipc_ namespace. Controls whether the pod containers can share the host IPC namespace. IPC (POSIX/SysV IPC) namespace provides separation of named shared memory segments, semaphores, and message queues.
355+
> Use the host's *ipc* namespace. Controls whether the pod containers can share the host IPC namespace. IPC (POSIX/SysV IPC) namespace provides separation of named shared memory segments, semaphores, and message queues.
354356
> Shared memory segments are used to accelerate inter-process communication at memory speed, rather than through pipes or the network stack.
355-
>
357+
>
356358
> For further information see [docker run reference](https://docs.docker.com/engine/reference/run/) documentation.
357359
358360
#### --host-network
359361

360362
> Use the host's network stack inside the container.
361363
> For further information see [docker run reference](https://docs.docker.com/engine/reference/run/)documentation.
362364
363-
#### --port `<stringArray>`
364-
365-
> Expose ports from the Job container.
366-
367365
#### -s | --service-type `<string>`
368366

369-
> External access type to interactive jobs. Options are:
367+
> External access type to jobs. Options are:
368+
>
369+
> * `nodeport` - add one or more ports using `--port`.
370+
> * `external-url` - add one port and an optional custom URL using `--custom-url`.
371+
>
372+
> For example:
373+
>
374+
> `runai submit test-jup -p team-a -i gcr.io/run-ai-demo/jupyter-tensorboard --service-type external-url --port 8888`
375+
>
376+
> `runai submit test-np -p team-a -i ubuntu --service-type nodeport --port 30000:7070`
377+
>
378+
> This flag supports more than one `service-type`. Multiple service types are supported in CSV style using multiple instances of the same option and commas to separate the values for them.
379+
>
380+
> For example:
381+
>
382+
>`runai submit test-np -p team-a -i ubuntu --service-type nodeport,port=30000:7070 --service-type external-url,port=30001`
383+
>
384+
>`runai submit test-np -p team-a -i ubuntu --service-type nodeport,port=30000:7070,port=9090 --service-type external-url,port=8080,custom-url=https://my.domain.com/url`
370385
>
371-
> * `portforward` (deprecated)
372-
> * `loadbalancer`
373-
> * `nodeport`
374-
> * `external-url`
386+
387+
#### --port `<stringArray>`
388+
389+
> Expose ports from the Job container. You can use a port number (for example 9090) or use the numbers of `hostport:containerport` (for example, 30000:7070).
375390
376391
#### --custom-url `<string>`
377392

378-
> An optional argument that specifies a custom URL when using the `external URL` service type. If not provided, the system will generate a URL automatically.
393+
> An optional argument that specifies a custom URL when using the `external-url` service type. If not provided, the system will generate a URL automatically.
379394
380395
### Access Control
381396

@@ -385,7 +400,7 @@ runai submit --job-name-prefix -i gcr.io/run-ai-demo/quickstart -g 1
385400
386401
#### --run-as-user
387402

388-
> Run in the context of the current user running the Run:ai command rather than the root user. While the default container user is *root* (same as in Docker), this command allows you to submit a Job running under your Linux user. This would manifest itself in access to operating system resources, in the owner of new folders created under shared directories, etc. Alternatively, if your cluster is connected to Run:ai via SAML, you can map the container to use the Linux UID/GID which is stored in the organization's directory. For more information see [non root containers](../../admin/runai-setup/config/non-root-containers.md).
403+
> Run in the context of the current user running the Run:ai command rather than the root user. While the default container user is *root* (same as in Docker), this command allows you to submit a Job running under your Linux user. This would manifest itself in access to operating system resources, in the owner of new folders created under shared directories, etc. Alternatively, if your cluster is connected to Run:ai via SAML, you can map the container to use the Linux UID/GID which is stored in the organization's directory. For more information see [non root containers](../../admin/runai-setup/config/non-root-containers.md).
389404
390405
### Scheduling
391406

@@ -396,11 +411,11 @@ runai submit --job-name-prefix -i gcr.io/run-ai-demo/quickstart -g 1
396411
397412
#### --node-type `<string>`
398413

399-
> Allows defining specific Nodes (machines) or a group of Nodes on which the workload will run. To use this feature your Administrator will need to label nodes as explained here: [Limit a Workload to a Specific Node Group](../../admin/researcher-setup/limit-to-node-group.md).
414+
> Allows defining specific Nodes (machines) or a group of Nodes on which the workload will run. To use this feature your Administrator will need to label nodes as explained here: [Limit a Workload to a Specific Node Group](../../admin/researcher-setup/limit-to-node-group.md).
400415
401416
#### --toleration `<string>`
402417

403-
> Specify one or more toleration criteria, to ensure that the workload is not scheduled onto an inappropriate node.
418+
> Specify one or more toleration criteria, to ensure that the workload is not scheduled onto an inappropriate node.
404419
> This is done by matching the workload tolerations to the taints defined for each node. For further details see Kubernetes
405420
> [Taints and Tolerations](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/){target=_blank} Guide.
406421
>
@@ -433,6 +448,5 @@ Note that the submit call may use a *policy* to provide defaults to any of the a
433448
434449
## See Also
435450
436-
* See any of the Quickstart documents [here:](../Walkthroughs/quickstart-overview.md).
437-
* See [policy configuration](../../admin/workloads/policies.md) for a description on how policies work.
438-
451+
* See any of the Quickstart documents [here:](../Walkthroughs/quickstart-overview.md).
452+
* See [policy configuration](../../admin/workloads/policies.md) for a description on how policies work.

0 commit comments

Comments
 (0)