You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Terraform Iterative provider is a plugin for Terraform that allows for the full lifecycle management of GPU or non GPU cloud resources with your favourite [vendor](#supported-vendors). The provider offers a simple and homogeneous way to deploy a GPU or a cluster of them reducing the complexity.
5
+
The Terraform Iterative provider is a plugin for Terraform that allows for the full lifecycle management of GPU or non GPU cloud resources and with your favourite [vendor](#supported-vendors). There are two types of resources available:
6
+
- iterative_machine
7
+
- iterative_cml_runner
6
8
7
9
# Usage
8
10
11
+
### CML runner
12
+
A CI self hosted runner based on a thin wrapper over the GL and GH runner:
13
+
- same spec:
14
+
- name
15
+
- labels
16
+
- idle-timeout
17
+
- repo
18
+
- token
19
+
- driver
20
+
- Unified logging
21
+
- Easy to launch
22
+
- Auto provision of cloud resources
23
+
- Auto unregister and removal of cloud resources
24
+
9
25
#### 1- Setup your provider credentials as ENV variables
10
26
27
+
<details>
28
+
<summary>AWS</summary>
29
+
<p>
30
+
11
31
```sh
12
32
export AWS_SECRET_ACCESS_KEY=YOUR_KEY
13
33
export AWS_ACCESS_KEY_ID=YOUR_ID
34
+
export CML_TOKEN=YOUR_REPO_TOKEN
14
35
```
36
+
</p>
37
+
</details>
38
+
39
+
<details>
40
+
<summary>Azure</summary>
41
+
<p>
42
+
43
+
```sh
44
+
export AZURE_CLIENT_ID=YOUR_ID
45
+
export AZURE_CLIENT_SECRET=YOUR_SECRET
46
+
export AZURE_SUBSCRIPTION_ID=YOUR_SUBSCRIPTION_ID
47
+
export AZURE_TENANT_ID=YOUR_TENANT_ID
48
+
export CML_TOKEN=YOUR_REPO_TOKEN
49
+
```
50
+
</p>
51
+
</details>
52
+
15
53
16
54
#### 2- Save your terraform file main.tf
17
55
56
+
<details>
57
+
<summary>AWS</summary>
58
+
<p>
59
+
60
+
```tf
61
+
terraform {
62
+
required_providers {
63
+
iterative = {
64
+
source = "iterative/iterative"
65
+
}
66
+
}
67
+
}
68
+
69
+
provider "iterative" {}
70
+
71
+
resource "iterative_machine" "machine" {
72
+
repo = "https://github.com/iterative/cml"
73
+
driver = "github"
74
+
labels = "tf"
75
+
76
+
cloud = "aws"
77
+
region = "us-west"
78
+
instance_type = "m"
79
+
}
80
+
```
81
+
</p>
82
+
</details>
83
+
84
+
<details>
85
+
<summary>Azure</summary>
86
+
<p>
87
+
88
+
```tf
89
+
terraform {
90
+
required_providers {
91
+
iterative = {
92
+
source = "iterative/iterative"
93
+
}
94
+
}
95
+
}
96
+
97
+
provider "iterative" {}
98
+
99
+
resource "iterative_machine" "machine" {
100
+
repo = "https://github.com/iterative/cml"
101
+
driver = "github"
102
+
labels = "tf"
103
+
104
+
cloud = "azure"
105
+
region = "us-west"
106
+
instance_type = "m"
107
+
}
108
+
```
109
+
</p>
110
+
</details>
111
+
112
+
113
+
#### 3- Launch it!
114
+
115
+
```
116
+
terraform init
117
+
terraform apply --auto-approve
118
+
119
+
# run it to destroy your instance
120
+
# terraform destroy --auto-approve
121
+
```
122
+
123
+
#### Argument reference
124
+
125
+
| Variable | Values | Default ||
126
+
| ------- | ------ | -------- | ------------- |
127
+
|```driver```|```gitlab``````github```|| The kind of runner that you are setting |
128
+
|```repo```||| The repo to subscribe to. |
129
+
|```token```||| The repository token. It must have Workflow permissions in Github. If not specified tries to read it from the env variable CML_REPO |
130
+
|```labels```||```cml```| The runner labels for your CI workflow to be waiting for |
131
+
|```idle-timeout```|| 5min | The max time for the runner to be waiting for jobs. If the timeout happens the runner will unregister automatically from the repo and cleanup all the cloud resources. If set to ```0``` it will wait forever. |
|```region```|```us-west``````us-east``````eu-west``````eu-north```|```us-west```| Sets the collocation region. AWS or Azure regions are also accepted. |
134
+
|```image```||```iterative-cml``` in AWS ```Canonical:UbuntuServer:18.04-LTS:latest``` in Azure | Sets the image to be used. On AWS the provider does a search in the cloud provider by image name not by id, taking the lastest version in case there are many with the same name. Defaults to [iterative-cml image](#iterative-cml-image). On Azure uses the form Publisher:Offer:SKU:Version|
135
+
|```name```|| iterative_{UID} | Sets the instance name and related resources based on that name. In Azure groups everything under a resource group with that name. |
136
+
|```instance_hdd_size```|| 10 | Sets the instance hard disk size in gb |
137
+
|```instance_type```|```m```, ```l```, ```xl```|```m```| Sets thee instance computing size. You can also specify vendor specific machines in AWS i.e. ```t2.micro```. [See equivalences]((#AWS-instance-equivalences)) table below. |
138
+
|```instance_gpu```|``````, ```testla```, ```k80```|``````| Sets the desired GPU if the ```instance_type``` is one of our types. |
139
+
|```ssh_private```||| SSH private in PEM format. If not provided one private and public key wll be automatically generated and returned in terraform.tfstate |
140
+
141
+
### Machine
142
+
143
+
#### 1- Setup your provider credentials as ENV variables
|```region```|```us-west``````us-east``````eu-west``````eu-north```|```us-west```| Sets the collocation region. AWS or Azure regions are also accepted. |
244
+
|```image```||```iterative-cml``` in AWS ```Canonical:UbuntuServer:18.04-LTS:latest``` in Azure | Sets the image to be used. On AWS the provider does a search in the cloud provider by image name not by id, taking the lastest version in case there are many with the same name. Defaults to [iterative-cml image](#iterative-cml-image). On Azure uses the form Publisher:Offer:SKU:Version|
245
+
|```name```|| iterative_{UID} | Sets the instance name and related resources based on that name. In Azure groups everything under a resource group with that name. |
246
+
|```instance_hdd_size```|| 10 | Sets the instance hard disk size in gb |
247
+
|```instance_type```|```m```, ```l```, ```xl```|```m```| Sets thee instance computing size. You can also specify vendor specific machines in AWS i.e. ```t2.micro```. [See equivalences]((#AWS-instance-equivalences)) table below. |
248
+
|```instance_gpu```|``````, ```testla```, ```k80```|``````| Sets the desired GPU if the ```instance_type``` is one of our types. |
249
+
|```ssh_private```||| SSH private in PEM format. If not provided one private and public key wll be automatically generated and returned in terraform.tfstate |
250
+
|```startup_script```||| Startup script also known as userData on AWS and customData in Azure. It can be expressed as multiline text using [TF heredoc syntax ](https://www.terraform.io/docs/configuration-0-11/variables.html)|
251
+
252
+
# Pitfalls
51
253
52
254
To be able to use the ```instance_type``` and ```instance_gpu``` you will need also to be allowed to launch [such instances](#AWS-instance-equivalences) within you cloud provider. Normally all the GPU instances need to be approved prior to be used by your vendor.
53
255
You can always try with an already approved instance type by your vendor just setting it i.e. ```t2.micro```
|```region```|```us-west``````us-east``````eu-west``````eu-north```|```us-west```| Sets the collocation region. AWS regions are also accepted. |
88
-
|```ami```||```iterative-cml```| Sets the ami to be used. For that the provider does a search in the cloud provider by image name not by id, taking the lastest version in case there are many with the same name. Defaults to [iterative-cml image](#iterative-cml-image)|
89
-
|```instance_name```|| cml_{UID} | Sets the instance name and related resources like AWS key pair. |
90
-
|```instance_hdd_size```|| 10 | Sets the instance hard disk size in gb |
91
-
|```instance_type```|```m```, ```l```, ```xl```|```m```| Sets thee instance computing size. You can also specify vendor specific machines in AWS i.e. ```t2.micro```. [See equivalences]((#AWS-instance-equivalences)) table below. |
92
-
|```instance_gpu```|``````, ```testla```, ```k80```|``````| Sets the desired GPU if the ```instance_type``` is one of our types. |
93
-
|```key_public```||| Set up ssh access with your OpenSSH public key. If not provided one be automatically generated and returned in terraform.tfstate |
94
-
| aws_security_group ||```cml```| AWS specific variable to setup an specific security group. If specified the instance will be launched in with that sg within the vpc managed by the specified sg. If not a new sg called ```cml``` will be created under the default vpc |
95
-
96
285
97
286
# Supported vendors
98
287
99
288
- AWS
289
+
- Azure
290
+
291
+
292
+
<details>
293
+
<summary>AWS instance equivalences</summary>
294
+
<p>
100
295
101
-
### AWS instance equivalences
102
296
The instance type in AWS is calculated joining the ```instance_type``` and ```instance_gpu```
103
297
104
298
| type | gpu | aws |
@@ -120,10 +314,41 @@ The instance type in AWS is calculated joining the ```instance_type``` and ```in
120
314
| eu-north | us-north-1 |
121
315
| eu-west | us-west-1 |
122
316
317
+
</p>
318
+
</details>
319
+
320
+
<details>
321
+
<summary>Azure instance equivalences</summary>
322
+
<p>
323
+
324
+
The instance type in Azure is calculated joining the ```instance_type``` and ```instance_gpu```
325
+
326
+
| type | gpu | azure |
327
+
| ------- | ------ | -------- |
328
+
| m || Standard_F8s_v2 |
329
+
| l || Standard_F32s_v2 |
330
+
| xl || Standard_F64s_v2 |
331
+
| m | k80 | Standard_NC6 |
332
+
| l | k80 | Standard_NC12 |
333
+
| xl | k80 | Standard_NC24 |
334
+
| m | tesla | Standard_NC6s_v3 |
335
+
| l | tesla | Standard_NC12s_v3 |
336
+
| xl | tesla | Standard_NC24s_v3 |
337
+
338
+
| region | azure |
339
+
| ------- | ------ |
340
+
| us-west | westus2 |
341
+
| us-east | eastus |
342
+
| eu-north | northeurope |
343
+
| eu-west | westeurope |
344
+
345
+
</p>
346
+
</details>
347
+
123
348
# iterative-cml image
124
349
125
350
It's a GPU ready image based on Ubuntu 18.04. It has the following stack already installed:
0 commit comments