Skip to content
This repository was archived by the owner on Jan 29, 2025. It is now read-only.

Commit 2d614aa

Browse files
byakomadalazar
authored andcommitted
Update GAS usage.md
1 parent 71e43a4 commit 2d614aa

File tree

1 file changed

+14
-8
lines changed

1 file changed

+14
-8
lines changed

gpu-aware-scheduling/docs/usage.md

Lines changed: 14 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,26 @@
11
# Usage with NFD and GPU-plugin
2-
This document explains how to get GAS working together with [Node Feature Discovery](https://github.com/kubernetes-sigs/node-feature-discovery) and the [GPU-plugin](https://github.com/intel/intel-device-plugins-for-kubernetes/blob/main/cmd/gpu_plugin/README.md).
2+
This document explains how to get GAS working together with [Node Feature Discovery](https://github.com/kubernetes-sigs/node-feature-discovery) (NFD) and the [GPU-plugin](https://github.com/intel/intel-device-plugins-for-kubernetes/blob/main/cmd/gpu_plugin/README.md).
33

4-
To begin with, it will help a lot if you have been successful already using the GPU-plugin with some deployments. That means your HW and cluster is most likely fine with GAS also.
4+
To begin with, it will help a lot if you have been successful already using the GPU-plugin with
5+
some deployments. That means your HW and cluster is most likely fine with GAS also.
56

67
## GPU-plugin
7-
Resource management enabled version of the GPU-plugin is currently necessary for running GAS. The resource management enabled GPU-plugin version can read the necessary annotations of the PODs, and without those annotations, GPU allocations will not work correctly. A copy of the plugin deployment kustomization can be found from [docs/gpu_plugin](./gpu_plugin). It can be deployed simply by issuing:
8+
Resource management is required to be enabled in GPU-plugin currently to run GAS. With resource
9+
management enabled, GPU-plugin can read the necessary annotations of the PODs. Without reading
10+
those annotations, GPU allocations will not work correctly. A copy of the plugin deployment
11+
kustomization can be found from [docs/gpu_plugin](./gpu_plugin). It can be deployed simply by
12+
issuing:
813
```Bash
914
kubectl apply -k docs/gpu_plugin/overlays/fractional_resources
1015
```
1116

1217
The GPU plugin initcontainer needs to be used in order to get the extended resources created with NFD. It is deployed by the kustomization base. The initcontainer installs the required NFD-hook into the host system.
1318

1419
## NFD
15-
Basically all versions starting with [v0.6.0](https://github.com/kubernetes-sigs/node-feature-discovery/releases/tag/v0.6.0) should work. You can use it to publish the GPU extended resources and GPU-related labels printed by the hook installed by the GPU-plugin initcontainer.
20+
All versions starting with [v0.6.0](https://github.com/kubernetes-sigs/node-feature-discovery/releases/tag/v0.6.0) should work. You can use it to publish the GPU extended resources and GPU-related labels printed by the hook installed by the GPU-plugin initcontainer.
1621

17-
For picking up the labels printed by the hook installed by the GPU-plugin initcontainer, deploy nfd master with this kind of command in its yaml:
22+
For NFD to pick up the labels that are printed by the hook installed by the GPU-plugin
23+
initcontainer, nfd master deployment shold have these options in command entry of its yaml:
1824
```YAML
1925
command: ["nfd-master", "--resource-labels=gpu.intel.com/memory.max,gpu.intel.com/millicores,gpu.intel.com/tiles", "--extra-label-ns=gpu.intel.com"]
2026
```
@@ -54,7 +60,7 @@ You need some i915 GPUs in the nodes. Internal GPUs work fine for testing GAS, m
5460

5561
## PODs
5662

57-
Your PODs then, needs to ask for some GPU-resources. Like this:
63+
Your PODs need to ask for GPU-resources, for instance:
5864
```YAML
5965
resources:
6066
limits:
@@ -63,7 +69,7 @@ Your PODs then, needs to ask for some GPU-resources. Like this:
6369
gpu.intel.com/memory.max: 10M
6470
```
6571
66-
Or like this for tiles:
72+
Or, for tiles:
6773
```YAML
6874
resources:
6975
limits:
@@ -132,7 +138,7 @@ Note that the feature is disabled by default. You need to enable allowlist and/o
132138

133139
By default when GAS checks if available Node resources are enough for Pod's resources requests,
134140
the containers of the Pod are processed sequentially and independently. In multi-gpu nodes in
135-
certain cases this may result (but not guaranteed) in container of the same Pod having different
141+
certain cases this may result (but not guaranteed) in containers of the same Pod having different
136142
GPUs allocated to them.
137143

138144
In case two or more containers of the same Pod require to use the same GPU, GAS supports

0 commit comments

Comments
 (0)