Skip to content
This repository was archived by the owner on Jan 29, 2025. It is now read-only.

Commit 3c514af

Browse files
committed
docs and yamls for dependencies
This adds more documentation and yamls for getting dependencies up and running. Signed-off-by: Ukri Niemimuukko <ukri.niemimuukko@intel.com>
1 parent ab25796 commit 3c514af

22 files changed

+586
-17
lines changed

gpu-aware-scheduling/README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ Note: a shell script that shows these steps can be found [here](deploy/extender-
3636

3737
The extender configuration files can be found under deploy/extender-configuration.
3838
GAS Scheduler Extender needs to be registered with the Kubernetes Scheduler. In order to do this a configmap should be created like the below:
39-
````
39+
```
4040
apiVersion: v1alpha1
4141
kind: ConfigMap
4242
metadata:
@@ -72,14 +72,14 @@ data:
7272
]
7373
}
7474
75-
````
75+
```
7676

7777
A similar file can be found [in the deploy folder](./deploy/extender-configuration/scheduler-extender-configmap.yaml). This configmap can be created with ``kubectl apply -f ./deploy/scheduler-extender-configmap.yaml``
7878
The scheduler requires flags passed to it in order to know the location of this config map. The flags are:
79-
````
79+
```
8080
- --policy-configmap=scheduler-extender-policy
8181
- --policy-configmap-namespace=kube-system
82-
````
82+
```
8383

8484
If scheduler is running as a service these can be added as flags to the binary. If scheduler is running as a container - as in kubeadm - these args can be passed in the deployment file.
8585
Note: For Kubeadm set ups some additional steps may be needed.
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
This folder has a simple example POD which uses kubernetes extended resources
2+
3+
To deploy, you can run in this folder:
4+
5+
```
6+
kubectl apply -f .
7+
```
8+
9+
Then you can check the GPU devices of the first pod in the deployment with:
10+
11+
```
12+
kubectl exec -it deploy/bb-example -- ls /dev/dri
13+
```
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
apiVersion: apps/v1
2+
kind: Deployment
3+
metadata:
4+
name: bb-example
5+
spec:
6+
replicas: 1
7+
selector:
8+
matchLabels:
9+
app: bb-example
10+
template:
11+
metadata:
12+
labels:
13+
app: bb-example
14+
spec:
15+
containers:
16+
- name: gpu-resource-request
17+
image: busybox:1.33.1
18+
command: ['sh', '-c', 'echo The gpu resource request app is running! && sleep 6000']
19+
resources:
20+
limits:
21+
gpu.intel.com/i915: 1
22+
gpu.intel.com/millicores: 100
23+
gpu.intel.com/memory.max: 1G
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
This folder has a simple example of how to deploy the Intel GPU plugin so that it has the fractional
2+
resource support enabled.
3+
4+
To deploy, you can run in this folder:
5+
6+
```
7+
kubectl apply -k overlays/fractional_resources
8+
```
Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
apiVersion: apps/v1
2+
kind: DaemonSet
3+
metadata:
4+
name: intel-gpu-plugin
5+
labels:
6+
app: intel-gpu-plugin
7+
spec:
8+
selector:
9+
matchLabels:
10+
app: intel-gpu-plugin
11+
template:
12+
metadata:
13+
labels:
14+
app: intel-gpu-plugin
15+
spec:
16+
initContainers:
17+
- name: intel-gpu-initcontainer
18+
image: intel/intel-gpu-initcontainer:devel
19+
imagePullPolicy: IfNotPresent
20+
securityContext:
21+
readOnlyRootFilesystem: true
22+
volumeMounts:
23+
- mountPath: /etc/kubernetes/node-feature-discovery/source.d/
24+
name: nfd-source-hooks
25+
containers:
26+
- name: intel-gpu-plugin
27+
env:
28+
- name: NODE_NAME
29+
valueFrom:
30+
fieldRef:
31+
fieldPath: spec.nodeName
32+
image: intel/intel-gpu-plugin:devel
33+
imagePullPolicy: IfNotPresent
34+
securityContext:
35+
readOnlyRootFilesystem: true
36+
volumeMounts:
37+
- name: devfs
38+
mountPath: /dev/dri
39+
readOnly: true
40+
- name: sysfs
41+
mountPath: /sys/class/drm
42+
readOnly: true
43+
- name: kubeletsockets
44+
mountPath: /var/lib/kubelet/device-plugins
45+
volumes:
46+
- name: devfs
47+
hostPath:
48+
path: /dev/dri
49+
- name: sysfs
50+
hostPath:
51+
path: /sys/class/drm
52+
- name: kubeletsockets
53+
hostPath:
54+
path: /var/lib/kubelet/device-plugins
55+
- name: nfd-source-hooks
56+
hostPath:
57+
path: /etc/kubernetes/node-feature-discovery/source.d/
58+
type: DirectoryOrCreate
59+
nodeSelector:
60+
kubernetes.io/arch: amd64
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
resources:
2+
- intel-gpu-plugin.yaml
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
bases:
2+
- base
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
apiVersion: apps/v1
2+
kind: DaemonSet
3+
metadata:
4+
name: intel-gpu-plugin
5+
spec:
6+
template:
7+
spec:
8+
containers:
9+
- name: intel-gpu-plugin
10+
args:
11+
- "-shared-dev-num=300"
12+
- "-resource-manager"
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
apiVersion: apps/v1
2+
kind: DaemonSet
3+
metadata:
4+
name: intel-gpu-plugin
5+
spec:
6+
template:
7+
spec:
8+
containers:
9+
- name: intel-gpu-plugin
10+
volumeMounts:
11+
- name: podresources
12+
mountPath: /var/lib/kubelet/pod-resources
13+
volumes:
14+
- name: podresources
15+
hostPath:
16+
path: /var/lib/kubelet/pod-resources
17+
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
apiVersion: apps/v1
2+
kind: DaemonSet
3+
metadata:
4+
name: intel-gpu-plugin
5+
spec:
6+
template:
7+
spec:
8+
serviceAccountName: resource-reader-sa
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
bases:
2+
- ../../base
3+
resources:
4+
- resource-cluster-role-binding.yaml
5+
- resource-cluster-role.yaml
6+
- resource-reader-sa.yaml
7+
patches:
8+
- add-serviceaccount.yaml
9+
- add-podresource-mount.yaml
10+
- add-args.yaml
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
apiVersion: rbac.authorization.k8s.io/v1
2+
kind: ClusterRoleBinding
3+
metadata:
4+
name: resource-reader-rb
5+
subjects:
6+
- kind: ServiceAccount
7+
name: resource-reader-sa
8+
namespace: default
9+
roleRef:
10+
kind: ClusterRole
11+
name: resource-reader
12+
apiGroup: rbac.authorization.k8s.io
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
apiVersion: rbac.authorization.k8s.io/v1
2+
kind: ClusterRole
3+
metadata:
4+
name: resource-reader
5+
rules:
6+
- apiGroups: [""]
7+
resources: ["pods"]
8+
verbs: ["list"]
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
apiVersion: v1
2+
kind: ServiceAccount
3+
metadata:
4+
name: resource-reader-sa
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
This folder has a simple example of how to deploy NFD so that it can create extended resources for
2+
GPU Aware Scheduling
3+
4+
To deploy, you can run in this folder:
5+
6+
```
7+
kubectl apply -k .
8+
```
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
apiVersion: apps/v1
2+
kind: DaemonSet
3+
metadata:
4+
name: nfd-worker
5+
spec:
6+
template:
7+
spec:
8+
containers:
9+
- env:
10+
# GPU_MEMORY_OVERRIDE is the value for gpus that don't tell memory amount via the driver
11+
- name: GPU_MEMORY_OVERRIDE
12+
value: "4000000000"
13+
# GPU_MEMORY_RESERVED is the value of memory scoped out from k8s for those gpus which
14+
# do tell the memory amount via the driver
15+
# - name: GPU_MEMORY_RESERVED
16+
# value: "294967295"
17+
name: nfd-worker
18+
19+
# the env var values propagate to the nfd extension hook (gpu nfd hook, installed by gpu plugin initcontainer)
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
apiVersion: apps/v1
2+
kind: Deployment
3+
metadata:
4+
name: nfd-master
5+
spec:
6+
template:
7+
spec:
8+
containers:
9+
- name: nfd-master
10+
command:
11+
- "nfd-master"
12+
- "--resource-labels=gpu.intel.com/memory.max,gpu.intel.com/millicores"
13+
- "--extra-label-ns=gpu.intel.com"
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
apiVersion: rbac.authorization.k8s.io/v1
2+
kind: ClusterRole
3+
metadata:
4+
name: nfd-master
5+
rules:
6+
- apiGroups:
7+
- ""
8+
resources:
9+
- nodes
10+
# since we are using command line flag --resource-labels to create extended resources
11+
# this kustomize patch uncomments "- nodes/status"
12+
- nodes/status
13+
verbs:
14+
- get
15+
- patch
16+
- update
17+
# List only needed for --prune
18+
- list
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
resources:
2+
- v0.7.0/nfd-master.yaml.template
3+
- v0.7.0/nfd-worker-daemonset.yaml.template
4+
patchesStrategicMerge:
5+
- kustom/external_resources.yaml
6+
- kustom/env_vars.yaml
7+
- kustom/rbac.yaml

0 commit comments

Comments
 (0)