Skip to content
This repository was archived by the owner on Jan 29, 2025. It is now read-only.

Commit 10b3ee1

Browse files
byakotogashidm
authored andcommitted
Add 'gas-same-gpu' annotation support
'gas-same-gpu: "container1,container2"' annotation tells GAS to ensure listed containers are given the same GPU. Intended to be used for multi-GPU nodes. Example use case: sharing framebuffer from video renderer container into video stream encoder container Signed-off-by: Alexey Fomenko <alexey.fomenko@intel.com>
1 parent 920a417 commit 10b3ee1

File tree

9 files changed

+525
-54
lines changed

9 files changed

+525
-54
lines changed

gpu-aware-scheduling/.golangci.yml

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
run:
2+
tests: true
3+
max-issues-per-linter: 0
4+
max-same-issues: 0
5+
6+
linters:
7+
enable-all: true
8+
disable:
9+
- paralleltest
10+
- gomoddirectives
11+
- exhaustivestruct
12+
- varnamelen
13+
- gofumpt
14+
- nonamedreturns
15+
- exhaustruct
16+
17+
linters-settings:
18+
gofmt:
19+
simplify: true
20+
gofumpt:
21+
lang-version: "1.18"
22+
golint:
23+
min-confidence: 0.9
24+
govet:
25+
check-shadowing: true
26+
enable:
27+
- "fieldalignment"
28+
gocyclo:
29+
min-complexity: 15
30+
gocognit:
31+
min-complexity: 31
32+
funlen:
33+
lines: 70
34+
cyclop:
35+
max-complexity: 12
36+
37+
issues:
38+
exclude-rules:
39+
- path: _test\.go
40+
linters:
41+
# Until the testing package allows pinning variables disable scopelint
42+
# for tests. See https://github.com/kyoh86/scopelint/issues/4.
43+
- scopelint
44+
- funlen
45+
- goimports
46+
- gofmt

gpu-aware-scheduling/Makefile

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,12 +8,15 @@ ifneq ("$(wildcard licenses/)","")
88
LOCAL_LICENSES=TRUE
99
endif
1010

11-
.PHONY: test all build image release-image format clean licenses mock e2e clean-licenses
11+
.PHONY: test all build image release-image format clean licenses mock e2e lint
1212

1313
test:
1414
go test ./... -v *_test.go
1515

16-
all: format build
16+
all: format build lint
17+
18+
lint: format build
19+
golangci-lint run
1720

1821
build:
1922
CGO_ENABLED=0 GO111MODULE=on go build -ldflags="-s -w" -o ./bin/extender ./cmd/gas-scheduler-extender

gpu-aware-scheduling/docs/usage.md

Lines changed: 80 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ To begin with, it will help a lot if you have been successful already using the
55

66
## GPU-plugin
77
Resource management enabled version of the GPU-plugin is currently necessary for running GAS. The resource management enabled GPU-plugin version can read the necessary annotations of the PODs, and without those annotations, GPU allocations will not work correctly. A copy of the plugin deployment kustomization can be found from [docs/gpu_plugin](./gpu_plugin). It can be deployed simply by issuing:
8-
```
8+
```Bash
99
kubectl apply -k docs/gpu_plugin/overlays/fractional_resources
1010
```
1111

@@ -15,14 +15,14 @@ The GPU plugin initcontainer needs to be used in order to get the extended resou
1515
Basically all versions starting with [v0.6.0](https://github.com/kubernetes-sigs/node-feature-discovery/releases/tag/v0.6.0) should work. You can use it to publish the GPU extended resources and GPU-related labels printed by the hook installed by the GPU-plugin initcontainer.
1616

1717
For picking up the labels printed by the hook installed by the GPU-plugin initcontainer, deploy nfd master with this kind of command in its yaml:
18-
```
18+
```YAML
1919
command: ["nfd-master", "--resource-labels=gpu.intel.com/memory.max,gpu.intel.com/millicores,gpu.intel.com/tiles", "--extra-label-ns=gpu.intel.com"]
2020
```
2121
2222
The above would promote three labels, "memory.max", "millicores" and "tiles" to extended resources of the node that produces the labels.
2323
2424
If you want to enable i915 capability scanning, the nfd worker needs to read debugfs, and therefore it needs to run as privileged, like this:
25-
```
25+
```YAML
2626
securityContext:
2727
runAsNonRoot: null
2828
# Adding GPU info labels needs debugfs "915_capabilities" access
@@ -31,7 +31,7 @@ If you want to enable i915 capability scanning, the nfd worker needs to read deb
3131
```
3232
3333
In order to allow NFD to create extended resource, you will have to give it RBAC-rule to access nodes/status, like:
34-
```
34+
```YAML
3535
rules:
3636
- apiGroups:
3737
- ""
@@ -44,7 +44,7 @@ rules:
4444
4545
A simple example of non-root NFD deployment kustomization can be found from [docs/nfd](./nfd). You can deploy it by running
4646
47-
```
47+
```Bash
4848
kubectl apply -k docs/nfd
4949
```
5050

@@ -55,7 +55,7 @@ You need some i915 GPUs in the nodes. Internal GPUs work fine for testing GAS, m
5555
## PODs
5656

5757
Your PODs then, needs to ask for some GPU-resources. Like this:
58-
```
58+
```YAML
5959
resources:
6060
limits:
6161
gpu.intel.com/i915: 1
@@ -64,7 +64,7 @@ Your PODs then, needs to ask for some GPU-resources. Like this:
6464
```
6565
6666
Or like this for tiles:
67-
```
67+
```YAML
6868
resources:
6969
limits:
7070
gpu.intel.com/i915: 1
@@ -119,13 +119,84 @@ share the same physical card.
119119

120120
## Allowlist and Denylist
121121

122-
You can use POD-annotations in your POD-templates to list the GPU names which you allow, or deny for your deployment. The values for the annotations are comma separated value lists of the form "card0,card1,card2", and the names of the annotations are:
122+
You can use POD-annotations in your POD-templates to list the GPU names which you allow, or deny
123+
for your deployment. The values for the annotations are comma separated value lists of the form
124+
"card0,card1,card2", and the names of the annotations are:
123125

124126
- `gas-allow`
125127
- `gas-deny`
126128

127129
Note that the feature is disabled by default. You need to enable allowlist and/or denylist via command line flags.
128130

131+
## Enforcing same gpu to multiple containers within Pod
132+
133+
By default when GAS checks if available Node resources are enough for Pod's resources requests,
134+
the containers of the Pod are processed sequentially and independently. In multi-gpu nodes in
135+
certain cases this may result (but not guaranteed) in container of the same Pod having different
136+
GPUs allocated to them.
137+
138+
In case two or more containers of the same Pod require to use the same GPU, GAS supports
139+
`gas-same-gpu` Pod annotation (value is a list of container names) that tells GAS which containers
140+
should only be given the same GPU. In case if neither of the GPUs on the node have enough available
141+
resources for all containers listed in such annotation, the current node will not be used for
142+
scheduling.
143+
144+
<details>
145+
<summary>Example Pod annotation</summary>
146+
147+
```YAML
148+
apiVersion: apps/v1
149+
kind: Deployment
150+
metadata:
151+
name: demo-app
152+
labels:
153+
app: demo
154+
spec:
155+
replicas: 1
156+
selector:
157+
matchLabels:
158+
app: demo
159+
template:
160+
metadata:
161+
labels:
162+
app: demo
163+
annotations:
164+
gas-same-gpu: busybox1,busybox2
165+
spec:
166+
containers:
167+
- name: nginx
168+
image: nginx:latest
169+
imagePullPolicy: IfNotPresent
170+
resources:
171+
limits:
172+
gpu.intel.com/i915: 1
173+
gpu.intel.com/millicores: 400
174+
- name: busybox2
175+
image: busybox:latest
176+
imagePullPolicy: IfNotPresent
177+
resources:
178+
limits:
179+
gpu.intel.com/i915: 1
180+
gpu.intel.com/millicores: 100
181+
command: ["/bin/sh", "-c", "sleep 3600"]
182+
- name: busybox1
183+
image: busybox:latest
184+
imagePullPolicy: IfNotPresent
185+
resources:
186+
limits:
187+
gpu.intel.com/i915: 1
188+
gpu.intel.com/millicores: 100
189+
command: ["/bin/sh", "-c", "sleep 3600"]
190+
```
191+
192+
</details>
193+
194+
### Restrictions
195+
196+
- Containers listed in `gas-same-gpu` annotation have to request exactly one `gpu.intel.com/i915` resource
197+
- Containers listed in `gas-same-gpu` annotation cannot request `gpu.intel.com/i915_monitoring` resource
198+
- Containers listed in `gas-same-gpu` annotation cannot request `gpu.intel.com/tiles` resource
199+
129200
## Summary in a chronological order
130201

131202
- GPU-plugin initcontainer installs an NFD hook which prints labels for you, based on the Intel GPUs it finds
@@ -142,4 +213,4 @@ Check the logs (kubectl logs podname -n namespace) from all of these when in tro
142213
- Check that NFD picks up the labels without complaints, no errors in NFD workers or the master
143214
- Check that your GPU-enabled nodes have NFD-created GPU extended resources (kubectl describe node nodename) and GPU-labels
144215
- Check the log of GAS POD. If the log does not show anything ending up happening during deploying of i915 resource consuming PODs, your scheduler extender setup may be incorrect. Verify that you have successfully run all the deployment steps and the related cluster setup script.
145-
- Check the GPU plugin logs
216+
- Check the GPU plugin logs

gpu-aware-scheduling/pkg/gpuscheduler/node_resource_cache.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -398,7 +398,7 @@ func (c *Cache) adjustTiles(adj bool, nodeName, tileAnnotation string) {
398398
// set adj=true to add, false to remove resources.
399399
func (c *Cache) adjustPodResources(pod *v1.Pod, adj bool, annotation, tileAnnotation, nodeName string) error {
400400
// get slice of resource maps, one map per container
401-
containerRequests := containerRequests(pod)
401+
_, containerRequests := containerRequests(pod, map[string]bool{})
402402

403403
// get slice of card name lists, one CSV list per container
404404
containerCards := strings.Split(annotation, "|")

gpu-aware-scheduling/pkg/gpuscheduler/node_resource_cache_test.go

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -477,7 +477,6 @@ func TestDeschedulingCards(t *testing.T) {
477477
})
478478

479479
applied := 0
480-
//nolint: unparam
481480
applyCheck := func(action k8stesting.Action) (handled bool, ret runtime.Object, err error) {
482481
patchAction, ok := action.(k8stesting.PatchAction)
483482
if !ok {
@@ -498,7 +497,6 @@ func TestDeschedulingCards(t *testing.T) {
498497
}
499498

500499
removed := 0
501-
//nolint: unparam
502500
removeCheck := func(action k8stesting.Action) (handled bool, ret runtime.Object, err error) {
503501
patchAction, ok := action.(k8stesting.PatchAction)
504502
if !ok {

0 commit comments

Comments
 (0)