You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+17-15Lines changed: 17 additions & 15 deletions
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@
8
8
9
9
TPI is a [Terraform](https://terraform.io) plugin built with machine learning in mind. This CLI tool offers full lifecycle management of computing resources (including GPUs and respawning spot instances) from several cloud vendors (AWS, Azure, GCP, K8s)... without needing to be a cloud expert.
10
10
11
-
-**Lower cost with spot recovery**: transparent auto-recovery from interrupted low-cost spot/preemptible instances
11
+
-**Lower cost with spot recovery**: transparent data checkpoint/restore & auto-respawning of low-cost spot/preemptible instances
12
12
-**No cloud vendor lock-in**: switch between clouds with just one line thanks to unified abstraction
13
13
-**No waste**: auto-cleanup unused resources (terminate compute instances upon task completion/failure & remove storage upon download of results), pay only for what you use
14
14
-**Developer-first experience**: one-command data sync & code execution with no external server, making the cloud feel like a laptop
@@ -39,10 +39,12 @@ There are a several reasons to use TPI instead of other related solutions (custo
39
39
TPI is a CLI tool, not a running service. It requires no additional orchestrating machine (control plane/head nodes) to schedule/recover/terminate instances. Instead, TPI runs (spot) instances via cloud-native scaling groups[^scalers], taking care of recovery and termination automatically on the cloud provider's side. This design reduces management overhead & infrastructure costs. You can close your laptop while cloud tasks are running -- auto-recovery happens even if you are offline.
40
40
2.**Unified tool for data science and software development teams**:
41
41
TPI provides consistent tooling for both data scientists and DevOps engineers, improving cross-team collaboration. This simplifies compute management to a single config file, and reduces time to deliver ML models into production.
42
+
3.**Reproducible, codified environments**:
43
+
Store hardware requirements in a single configuration file alongside the rest of your ML pipeline code.
42
44
43
45
[^scalers]: [AWS Auto Scaling Groups](https://docs.aws.amazon.com/autoscaling/ec2/userguide/what-is-amazon-ec2-auto-scaling.html), [Azure VM Scale Sets](https://azure.microsoft.com/en-us/services/virtual-machine-scale-sets), [GCP managed instance groups](https://cloud.google.com/compute/docs/instance-groups#managed_instance_groups), and [Kubernetes Jobs](https://kubernetes.io/docs/concepts/workloads/controllers/job).
44
46
45
-
<imgwidth=24pxsrc="https://static.iterative.ai/logo/cml.svg"/> TPI is used to power [CML runners](https://cml.dev/doc/self-hosted-runners), bringing cloud providers to existing CI/CD workflows.
47
+
<imgwidth=24pxsrc="https://static.iterative.ai/logo/cml.svg"/> TPI is used to power [CML](https://cml.dev), bringing cloud providers to existing GitHub, GitLab & Bitbucket CI/CD workflows ([repository](https://github.com/iterative/cml)).
46
48
47
49
## Usage
48
50
@@ -74,12 +76,12 @@ provider "iterative" {}
74
76
resource "iterative_task" "example" {
75
77
cloud = "aws" # or any of: gcp, az, k8s
76
78
machine = "m" # medium. Or any of: l, xl, m+k80, xl+v100, ...
77
-
spot = 0 # auto-price. Or -1 to disable, or >0 to set a hourly USD limit
79
+
spot = 0 # auto-price. Default -1 to disable, or >0 for hourly USD limit
Copy file name to clipboardExpand all lines: docs/guides/generic-machine-types.md
+11-5Lines changed: 11 additions & 5 deletions
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@ subcategory: Development
7
7
8
8
The table below is a more detailed version of the common choices summarised in [Task Machine Types](https://registry.terraform.io/providers/iterative/iterative/latest/docs/resources/task#machine-type).
Copy file name to clipboardExpand all lines: docs/index.md
+4-2Lines changed: 4 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@
7
7
8
8
TPI is a [Terraform](https://terraform.io) plugin built with machine learning in mind. This CLI tool offers full lifecycle management of computing resources (including GPUs and respawning spot instances) from several cloud vendors (AWS, Azure, GCP, K8s)... without needing to be a cloud expert.
9
9
10
-
-**Lower cost with spot recovery**: transparent auto-recovery from interrupted low-cost spot/preemptible instances
10
+
-**Lower cost with spot recovery**: transparent data checkpoint/restore & auto-respawning of low-cost spot/preemptible instances
11
11
-**No cloud vendor lock-in**: switch between clouds with just one line thanks to unified abstraction
12
12
-**No waste**: auto-cleanup unused resources (terminate compute instances upon task completion/failure & remove storage upon download of results), pay only for what you use
13
13
-**Developer-first experience**: one-command data sync & code execution with no external server, making the cloud feel like a laptop
@@ -37,8 +37,10 @@ There are a several reasons to use TPI instead of other related solutions (custo
37
37
TPI is a CLI tool, not a running service. It requires no additional orchestrating machine (control plane/head nodes) to schedule/recover/terminate instances. Instead, TPI runs (spot) instances via cloud-native scaling groups ([AWS Auto Scaling Groups](https://docs.aws.amazon.com/autoscaling/ec2/userguide/what-is-amazon-ec2-auto-scaling.html), [Azure VM Scale Sets](https://azure.microsoft.com/en-us/services/virtual-machine-scale-sets), [GCP managed instance groups](https://cloud.google.com/compute/docs/instance-groups#managed_instance_groups), and [Kubernetes Jobs](https://kubernetes.io/docs/concepts/workloads/controllers/job)), taking care of recovery and termination automatically on the cloud provider's side. This design reduces management overhead & infrastructure costs. You can close your laptop while cloud tasks are running -- auto-recovery happens even if you are offline.
38
38
2.**Unified tool for data science and software development teams**:
39
39
TPI provides consistent tooling for both data scientists and DevOps engineers, improving cross-team collaboration. This simplifies compute management to a single config file, and reduces time to deliver ML models into production.
40
+
3.**Reproducible, codified environments**:
41
+
Store hardware requirements in a single configuration file alongside the rest of your ML pipeline code.
40
42
41
-
<imgwidth=24pxsrc="https://static.iterative.ai/logo/cml.svg"/> TPI is used to power [CML runners](https://cml.dev/doc/self-hosted-runners), bringing cloud providers to existing CI/CD workflows.
43
+
<imgwidth=24pxsrc="https://static.iterative.ai/logo/cml.svg"/> TPI is used to power [CML](https://cml.dev), bringing cloud providers to existing GitHub, GitLab & Bitbucket CI/CD workflows ([repository](https://github.com/iterative/cml)).
0 commit comments