Skip to content

Commit 78e7f06

Browse files
[RUN-10579] add new policy for GPU and CPU memory (#453)
* [RUN-10579] add new policy for GPU and CPU memory * Update docs/admin/workloads/policies.md * Apply suggestions from code review
1 parent e2b7555 commit 78e7f06

File tree

1 file changed

+33
-9
lines changed

1 file changed

+33
-9
lines changed

docs/admin/workloads/policies.md

Lines changed: 33 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ Policies allow administrators to _impose restrictions_ and set _default values_
66

77
1. Restrict researchers from requesting more than 2 GPUs, or less than 1GB of memory for an interactive workload.
88
2. Set the default memory of each training job to 1GB, or mount a default volume to be used by any submitted Workload.
9-
9+
1010
Policies are stored as Kubernetes [custom resources](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources){default=_blank}.
1111

1212
Policies are specific to Workload type as such there are several kinds of Policies:
@@ -15,7 +15,7 @@ Policies are specific to Workload type as such there are several kinds of Polici
1515
|----------------|-----------------|-------------|
1616
| Interactive | `InteractiveWorkload` | `InteractivePolicy` |
1717
| Training | `TrainingWorkload`| `TrainingPolicy` |
18-
| Distributed Training | `DistributedWorkload` | `DistributedPolicy` |
18+
| Distributed Training | `DistributedWorkload` | `DistributedPolicy` |
1919
| Inference | `InferenceWorkload` | `InferencePolicy` |
2020

2121
A Policy can be created per Run:ai Project (Kubernetes namespace). Additionally, a Policy resource can be created in the `runai` namespace. This special Policy will take effect when there is no project-specific Policy for the relevant workload kind.
@@ -51,19 +51,43 @@ The policy places a default and limit on the available values for GPU allocation
5151
``` bash
5252
kubectl apply -f gpupolicy.yaml
5353
```
54+
5455
Now, try the following command:
56+
5557
``` bash
5658
runai submit --gpu 5 --interactive -p team-a
5759
```
60+
5861
The following message will appear:
62+
5963
```
6064
gpu: must be no greater than 4
6165
```
62-
A similar message will appear in the _New Job_ form of the Run:ai user interface, when attempting to enter the number of GPUs, which is out of range for an Interactive tab.
66+
67+
A similar message will appear in the _New Job_ form of the Run:ai user interface, when attempting to enter the number of GPUs, which is out of range for a training job.
68+
69+
The following policy places a default and limit on the available values for CPU and GPU memory allocation.
70+
71+
```YAML title="gpumemorypolicy.yaml"
72+
apiVersion: run.ai/v2alpha1
73+
kind: TrainingPolicy
74+
metadata:
75+
name: training-policy
76+
namespace: runai
77+
spec:
78+
gpuMemory:
79+
rules:
80+
min: 100M
81+
max: 2G
82+
memory:
83+
rules:
84+
min: 100M
85+
max: 2G
86+
```
6387
6488
## Read-only values
6589
66-
When you do not want the user to be able to change a value, you can force the corresponding user interface control to become read-only by using the `canEdit` key. For example,
90+
When you do not want the user to be able to change a value, you can force the corresponding user interface control to become read-only by using the `canEdit` key. For example,
6791

6892
``` YAML title="runasuserpolicy.yaml"
6993
apiVersion: run.ai/v2alpha1
@@ -82,15 +106,15 @@ spec:
82106
```
83107

84108
1. Set the Project namespace here.
85-
2. The field is required.
86-
3. The field will be shown as read-only in the user interface.
109+
2. The field is required.
110+
3. The field will be shown as read-only in the user interface.
87111
4. The field value is true.
88112

89113
### Complex Values
90114

91-
The example above illustrated rules for parameters of "primitive" types, such as _GPU allocation_, _CPU memory_, _working directory_, etc. These parameters contain a single value.
115+
The example above illustrated rules for parameters of "primitive" types, such as _GPU allocation_, _CPU memory_, _working directory_, etc. These parameters contain a single value.
92116

93-
Other workload parameters, such as _ports_ or _volumes_, are "complex", in the sense that they may contain multiple values: a workload may contain multiple ports and multiple volumes.
117+
Other workload parameters, such as _ports_ or _volumes_, are "complex", in the sense that they may contain multiple values: a workload may contain multiple ports and multiple volumes.
94118

95119
The following is an example of a policy containing the value `ports`, which is complex: The `ports` flag typically contains two values: The `external` port that is mapped to an internal `container` port. One can have multiple port tuples defined for a single Workload:
96120

@@ -235,7 +259,7 @@ FIELDS:
235259
if the map as a whole is required
236260
```
237261

238-
Note that each kind of policy has a slightly different set of parameters. For example, an `InteractivePolicy` has a `jupyter` parameter that is not available under `TrainingPolicy`.
262+
Note that each kind of policy has a slightly different set of parameters. For example, an `InteractivePolicy` has a `jupyter` parameter that is not available under `TrainingPolicy`.
239263

240264
### Using Secrets for Environment Variables
241265

0 commit comments

Comments
 (0)