@@ -361,6 +361,39 @@ for obtaining these keys.
361
361
☝️ **Note** The same credentials can also be used for
362
362
[configuring cloud storage](/doc/cml-with-dvc#cloud-storage-provider-credentials).
363
363
364
+ The following are the minimum IAM permissions needed for the CML runner to
365
+ deploy on EC2 :
366
+
367
+ - ` ec2:CreateSecurityGroup` -- _(Firewall and SSH Access Management)_
368
+ - ` ec2:AuthorizeSecurityGroupEgress`
369
+ - ` ec2:AuthorizeSecurityGroupIngress`
370
+ - ` ec2:DescribeSecurityGroups`
371
+ - ` ec2:DescribeSubnets`
372
+ - ` ec2:DescribeVpcs`
373
+ - ` ec2:ImportKeyPair`
374
+ - ` ec2:DeleteKeyPair`
375
+ - ` ec2:CreateTags` -- _(General Resource Management)_
376
+ - ` ec2:RunInstances` -- _(EC2 Instance Management)
377
+ - ` ec2:DescribeImages`
378
+ - ` ec2:DescribeInstances`
379
+ - ` ec2:TerminateInstances`
380
+ - ` ec2:DescribeSpotInstanceRequests` -- _(Optionally needed for Spot Access)_
381
+ - ` ec2:RequestSpotInstances`
382
+ - ` ec2:CancelSpotInstanceRequests`
383
+
384
+ Outside of this list, you will need to add any extra permissions required
385
+ for your process to complete. These extra permissions can either be added
386
+ directly to the account used by the `cml runner` or can be specified during
387
+ the `cml runnner` command with :
388
+ [`--cloud-permission-set`](https://cml.dev/doc/ref/runner#--cloud-permission-set)
389
+
390
+ For example, if you need S3 read and write data, you may want to add :
391
+
392
+ - ` s3:ListBucket`
393
+ - ` s3:PutObject`
394
+ - ` s3:GetObject`
395
+ - ` s3:DeleteObject`
396
+
364
397
</tab>
365
398
<tab title="Azure">
366
399
@@ -391,6 +424,50 @@ provisioned through environment variables instead of files.
391
424
</tab>
392
425
</toggle>
393
426
427
+ # ### Cloud Compute Resource Manual Cleanup
428
+
429
+ In very rare cases, you may need to cleanup CML cloud resources manually.
430
+ An example of such a problem can be seen
431
+ [when an EC2 instance ran out of storage space](https://github.com/iterative/cml/issues/1006).
432
+
433
+ The following is a list of all the resources you may need to
434
+ manually cleanup in the case of a failure :
435
+
436
+ - The running instance (named with pattern `cml-{random-id}`)
437
+ - The volume attached to the running instance
438
+ (this should delete itself after terminating the instance)
439
+ - The generated key-pair (named with pattern `cml-{random-id}`)
440
+
441
+ If you keep encountering issues, it is appreciated to attempt pulling the logs
442
+ from the running instance before terminating and opening a GitHub Issue.
443
+
444
+ For easy access and debugging on the `cml runner` instance add :
445
+
446
+ > `--cloud-startup-script=$(echo 'echo "$(curl https://github.com/'"$GITHUB_ACTOR"'.keys)" >> /home/ubuntu/.ssh/authorized_keys' | base64 -w 0)`
447
+
448
+ If you encounter an error with the `cml runner` instance retrieving logs
449
+ with the following is helpful for diagnosing the issue :
450
+
451
+ ☝️ **Note** Please give your cml.log a visual scan, entries like IP addresses
452
+ and git repository names may be present and sensitive in some cases.
453
+
454
+ ` ` ` bash
455
+ ssh ubuntu@instance_public_ip
456
+ sudo journalctl -n all -u cml.service --no-pager > cml.log
457
+ sudo dmesg --ctime > system.log
458
+ ` ` `
459
+
460
+ You can then copy those logs to your local machine with :
461
+
462
+ ` ` ` bash
463
+ scp ubuntu@instance_public_ip:~/cml.log .
464
+ scp ubuntu@instance_public_ip:~/system.log .
465
+ ` ` `
466
+
467
+ There is a chance that the instance could be severely broken if the SSH command
468
+ hangs -- if that happens reboot it from the web console and try the commands
469
+ again.
470
+
394
471
# ### On-premise (Local) Runners
395
472
396
473
The `cml runner` command can also be used to manually set up a local machine,
0 commit comments