Skip to content
This repository was archived by the owner on Jan 29, 2025. It is now read-only.

Commit 71e43a4

Browse files
byakomadalazar
authored andcommitted
Update main GAS doc
1 parent 3feffb4 commit 71e43a4

File tree

1 file changed

+43
-20
lines changed

1 file changed

+43
-20
lines changed

gpu-aware-scheduling/README.md

Lines changed: 43 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -32,38 +32,59 @@ The deploy folder has all of the yaml files necessary to get GPU Aware Schedulin
3232

3333
#### Extender configuration
3434
You should follow extender configuration instructions from the
35-
[Telemetry Aware Scheduling](../telemetry-aware-scheduling/README.md#Extender-configuration) and adapt those instructions to
36-
use GPU Aware Scheduling configurations, which can be found in the [deploy/extender-configuration](deploy/extender-configuration) folder.
35+
[Telemetry Aware Scheduling](../telemetry-aware-scheduling/README.md#Extender-configuration) and
36+
adapt those instructions to use GPU Aware Scheduling configurations, which can be found in the
37+
[deploy/extender-configuration](deploy/extender-configuration) folder.
3738

3839
#### Deploy GAS
39-
GPU Aware Scheduling uses go modules. It requires Go 1.18 with modules enabled in order to build. GAS has been tested with Kubernetes 1.23.
40-
A yaml file for GAS is contained in the deploy folder along with its service and RBAC roles and permissions.
40+
GAS has been tested with Kubernetes 1.24. A yaml file for GAS is contained in the deploy folder
41+
along with its service and RBAC roles and permissions.
4142

42-
A secret called extender-secret will need to be created with the cert and key for the TLS endpoint. GAS will not deploy if there is no
43-
secret available with the given deployment file.
43+
A secret called `extender-secret` will need to be created with the cert and key for the TLS endpoint.
44+
If you name the secret differently, remember to fix the [deployment file](deploy/gas-deployment.yaml) respectively before deploying GAS.
4445

45-
A secret can be created with:
46+
The secret can be created with command:
4647

47-
``
48+
```bash
4849
kubectl create secret tls extender-secret --cert /etc/kubernetes/<PATH_TO_CERT> --key /etc/kubernetes/<PATH_TO_KEY>
49-
``
50-
51-
In order to build in your host:
50+
```
5251

53-
``make build``
52+
Replace <PATH_TO_CERT> and <PATH_TO_KEY> with the names your cluster has, here is the same comand
53+
with the default values:
5454

55-
You can also build inside docker, which creates the container:
55+
```bash
56+
kubectl create secret tls extender-secret --cert /etc/kubernetes/pki/ca.crt --key /etc/kubernetes/pki/ca.key
57+
```
5658

57-
``make image``
59+
Note, you might need privileges to access default location of these files, use `sudo <same command` then in this case.
5860

59-
The "deploy"-folder has the necessary scripts for deploying. You can simply deploy by running:
61+
The "deploy"-folder has the necessary scripts for deploying GAS. You can simply deploy by running:
6062

61-
``kubectl apply -f deploy/``
63+
```bash
64+
kubectl apply -f deploy/
65+
```
6266

6367
After this is run GAS should be operable in the cluster and should be visible after running ``kubectl get pods``
6468

6569
Remember to run the configure-scheduler.sh script, or perform similar actions in your cluster if the script does not work in your environment directly.
6670

71+
#### Build GAS locally
72+
73+
GPU Aware Scheduling uses go modules. It requires Go 1.18 with modules enabled for building.
74+
To build GAS locally on your host:
75+
76+
```bash
77+
make build
78+
```
79+
80+
You can also build inside docker, which creates the container:
81+
82+
```bash
83+
make image
84+
```
85+
86+
To deploy locally built GAS container image, just change the [deployment YAML](deploy/gas-deployment.yaml) and deploy normally as if it was pre-built image, see above.
87+
6788
### Configuration flags
6889
The below flags can be passed to the binaries at run time.
6990

@@ -72,20 +93,22 @@ name |type | description| usage | default|
7293
-----|------|-----|-------|-----|
7394
|kubeConfig| string |location of kubernetes configuration file | --kubeConfig /root/filename|~/.kube/config
7495
|port| int | port number on which the scheduler extender will listen| --port 32000 | 9001
75-
|cert| string | location of the cert file for the TLS endpoint | --cert=/root/cert.txt| /etc/kubernetes/pki/ca.key
96+
|cert| string | location of the cert file for the TLS endpoint | --cert=/root/cert.txt| /etc/kubernetes/pki/ca.crt
7697
|key| string | location of the key file for the TLS endpoint| --key=/root/key.txt | /etc/kubernetes/pki/ca.key
77-
|cacert| string | location of the ca certificate for the TLS endpoint| --key=/root/cacert.txt | /etc/kubernetes/pki/ca.crt
98+
|cacert| string | location of the ca certificate for the TLS endpoint| --cacert=/root/cacert.txt | /etc/kubernetes/pki/ca.crt
7899
|enableAllowlist| bool | enable POD-annotation based GPU allowlist feature | --enableAllowlist| false
79100
|enableDenylist| bool | enable POD-annotation based GPU denylist feature | --enableDenylist| false
80101
|balancedResource| string | enable named resource balancing between GPUs | --balancedResource| ""
81102

103+
Some features are based on the labels put onto pods, for full features list see [usage doc](docs/usage.md)
104+
82105
#### Balanced resource (optional)
83106
GAS can be configured to balance named resources so that the resource requests are distributed as evenly as possible between the GPUs. For example if the balanced resource is set to "tiles" and the containers request 1 tile each, the first container could get tile from "card0", the second from "card1", the third again from "card0" and so on.
84107

85108
## Adding the resource to make a deployment use GAS Scheduler Extender
86109

87110
For example, in a deployment file:
88-
```
111+
```yaml
89112
apiVersion: apps/v1
90113
kind: Deployment
91114
metadata:
@@ -112,7 +135,7 @@ spec:
112135
```
113136
114137
There is one change to the yaml here:
115-
- A resources/limits entry requesting the resource gpu.intel.com/i915. This is used to restrict the use of GAS to only selected pods. If this is not in a pod spec the pod will not be scheduled by GAS.
138+
- A resources/limits entry requesting the resource gpu.intel.com/i915 will make GAS take part in scheduling such deployment. If this resource is not requested, GAS will not be used during scheduling of the pod.
116139
117140
### Unsupported use-cases
118141

0 commit comments

Comments
 (0)