Skip to content

Commit 94c5fb1

Browse files
authored
Merge pull request #239 from chaitanya1731/playbook
ansible: Added inital playbook code
2 parents 3afdab7 + ba7e970 commit 94c5fb1

File tree

2 files changed

+223
-0
lines changed

2 files changed

+223
-0
lines changed

one_click/README.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
# Deploy Intel Technology Enabling Solutions with Red Hat OpenShift using “One-Click”
2+
3+
## Overview
4+
Red Hat [Ansible](https://www.ansible.com/) and Operator technologies are used for “One-Click Deployment” of Intel technology enabling solutions with Red Hat OpenShift Container Platform (RHOCP). Ansible technology automates the operator installation and configuration steps using a playbook, making deployment as simple as a single click.
5+
6+
The referenced Ansible playbooks here can be used by the cluster administrators to customize their own playbooks.
7+
8+
>[!NOTE]
9+
> It is recommended to start from [Get started](/README.md#getting-started) to get familiar with the installation and configuration of the general operator before composing the first playbook.
10+
11+
## Reference Playbook – Intel Data Center GPU Provisioning
12+
13+
This playbook demonstrates the one-click provisioning of Intel Data Center GPU on an RHOCP cluster. The steps involved are installation and configuration of general Operators including Node Feature Discovery (NFD) operator, Kernel Module Management (KMM) operator, and the Intel Device Plugins Operator.
14+
15+
### Prerequisite
16+
17+
Before running the playbook, ensure the following prerequisites are met:
18+
- Provisioned RHOCP Cluster
19+
- Red Hat Enterprise Linux (RHEL) system with [Ansible](https://docs.ansible.com/ansible/2.9/installation_guide/intro_installation.html#installing-ansible-on-rhel-centos-or-fedora) installed and configured with a `kubeconfig` to connect to your RHOCP cluster.
20+
21+
### Run the Playbook
22+
To run the ansible playbook, clone this repository to your RHEL system. Navigate to the directory containing the playbook.
23+
```
24+
$ git clone https://github.com/intel/intel-technology-enabling-for-openshift.git
25+
$ cd intel-technology-enabling-for-openshift/one_click
26+
```
27+
28+
Execute below single command to provision Intel Data Center GPU:
29+
```
30+
$ ansible-playbook gpu_provisioning_playbook.yaml
31+
```
Lines changed: 192 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,192 @@
1+
# Copyright (c) 2024 Intel Corporation
2+
# SPDX-License-Identifier: Apache-2.0
3+
- hosts: localhost
4+
vars:
5+
kubeconfig_path: "~/.kube/config"
6+
environment:
7+
KUBECONFIG: "{{ kubeconfig_path }}"
8+
tasks:
9+
- name: Install Dependencies
10+
tags:
11+
- install_dependencies
12+
block:
13+
- name: NFD - Install Node Feature Discovery Operator
14+
tags:
15+
- nfd
16+
block:
17+
- name: NFD - Create openshift-nfd namespace
18+
k8s:
19+
name: openshift-nfd
20+
api_version: v1
21+
kind: Namespace
22+
state: present
23+
wait: yes
24+
- name: NFD - Create an nfd-operator group v1
25+
k8s:
26+
definition:
27+
apiVersion: operators.coreos.com/v1
28+
kind: OperatorGroup
29+
metadata:
30+
generateName: openshift-nfd-
31+
name: openshift-nfd
32+
namespace: openshift-nfd
33+
spec:
34+
targetNamespaces:
35+
- openshift-nfd
36+
wait: yes
37+
- name: NFD - Create subscription for RH NFD operator
38+
k8s:
39+
definition:
40+
apiVersion: operators.coreos.com/v1alpha1
41+
kind: Subscription
42+
metadata:
43+
name: nfd
44+
namespace: openshift-nfd
45+
spec:
46+
channel: "stable"
47+
installPlanApproval: Automatic
48+
name: nfd
49+
source: redhat-operators
50+
sourceNamespace: openshift-marketplace
51+
wait: yes
52+
wait_condition:
53+
reason: AllCatalogSourcesHealthy
54+
type: CatalogSourcesUnhealthy
55+
status: 'False'
56+
- name: NFD - Wait until the nfd-operator-controller Deployment is available
57+
k8s_info:
58+
kind: Deployment
59+
wait: yes
60+
name: nfd-controller-manager
61+
namespace: openshift-nfd
62+
wait_condition:
63+
type: Available
64+
status: 'True'
65+
reason: MinimumReplicasAvailable
66+
- name: KMM - Install Kernel Module Management Operator
67+
tags:
68+
- kmm
69+
block:
70+
- name: KMM - Create openshift-kmm namespace
71+
k8s:
72+
name: openshift-kmm
73+
api_version: v1
74+
kind: Namespace
75+
state: present
76+
wait: yes
77+
- name: KMM - Create OperatorGroup v1 in openshift-kmm namespace
78+
k8s:
79+
definition:
80+
apiVersion: operators.coreos.com/v1
81+
kind: OperatorGroup
82+
metadata:
83+
name: kernel-module-management
84+
namespace: openshift-kmm
85+
wait: yes
86+
- name: KMM - Create Subscription for KMM Operator
87+
k8s:
88+
definition:
89+
apiVersion: operators.coreos.com/v1alpha1
90+
kind: Subscription
91+
metadata:
92+
name: kernel-module-management
93+
namespace: openshift-kmm
94+
spec:
95+
channel: stable
96+
installPlanApproval: Automatic
97+
name: kernel-module-management
98+
source: redhat-operators
99+
sourceNamespace: openshift-marketplace
100+
startingCSV: kernel-module-management.v2.1.1
101+
wait: yes
102+
wait_condition:
103+
reason: AllCatalogSourcesHealthy
104+
type: CatalogSourcesUnhealthy
105+
status: 'False'
106+
- name: KMM - Wait until the kmm-operator-controller Deployment is available
107+
k8s_info:
108+
kind: Deployment
109+
wait: yes
110+
name: kmm-operator-controller
111+
namespace: openshift-kmm
112+
wait_condition:
113+
type: Available
114+
status: 'True'
115+
reason: MinimumReplicasAvailable
116+
- name: KMM - Update KMM CM to set FW path
117+
command: |
118+
oc patch configmap kmm-operator-manager-config -n openshift-kmm --type='json' -p='[{"op": "add", "path": "/data/controller_config.yaml", "value": "healthProbeBindAddress: :8081\nmetricsBindAddress: 127.0.0.1:8080\nleaderElection:\n enabled: true\n resourceID: kmm.sigs.x-k8s.io\nwebhook:\n disableHTTP2: true\n port: 9443\nworker:\n runAsUser: 0\n seLinuxType: spc_t\n setFirmwareClassPath: /var/lib/firmware"}]'
119+
- name: KMM - Delete the KMM operator controller pod for `ConfigMap` changes to take effect.
120+
shell: oc get pods -n openshift-kmm | grep -i "kmm-operator-controller-" | awk '{print $1}' | xargs oc delete pod -n openshift-kmm
121+
- name: KMM - wait 10 seconds until KMM operator pod is up and running
122+
pause:
123+
seconds: 10
124+
- name: IDPO - Install Intel Device Plugins Operator
125+
tags:
126+
- idpo
127+
block:
128+
- name: IDPO - Install Intel Device Plugins Operator
129+
k8s:
130+
state: present
131+
src: "../device_plugins/install_operator.yaml"
132+
wait: yes
133+
- name: IDPO - Wait until the inteldeviceplugins-controller-manager Deployment is available
134+
k8s_info:
135+
kind: Deployment
136+
wait: yes
137+
name: inteldeviceplugins-operator-controller
138+
namespace: openshift-operators
139+
wait_condition:
140+
type: Available
141+
status: 'True'
142+
reason: MinimumReplicasAvailable
143+
- name: NFD - Install NFD CRs
144+
block:
145+
- name: NFD - Create NFD discovery CR
146+
k8s:
147+
state: present
148+
src: "../nfd/node-feature-discovery-openshift.yaml"
149+
wait: yes
150+
- name: NFD - Create NFD rules instance CR
151+
k8s:
152+
state: present
153+
src: "../nfd/node-feature-rules-openshift.yaml"
154+
wait: yes
155+
- name: KMM - Deploy Pre-built Out-of-Tree Intel Data Center GPU Driver Container for OpenShift using KMM
156+
block:
157+
- name: KMM - Install KMM pre-build Module CR
158+
k8s:
159+
state: present
160+
src: "../kmmo/intel-dgpu.yaml"
161+
wait: yes
162+
- name: IDPO - Install GPU plugin
163+
block:
164+
- name: IDPO - Create Intel GPU device plugin
165+
k8s:
166+
state: present
167+
src: "../device_plugins/gpu_device_plugin.yaml"
168+
wait: yes
169+
- name: GPU TEST
170+
tags:
171+
- gpu_test
172+
block:
173+
- name: GPU TEST - Get Node Resource Information
174+
kubernetes.core.k8s_info:
175+
api: v1
176+
kind: Node
177+
label_selectors:
178+
- "intel.feature.node.kubernetes.io/gpu=true"
179+
- "kmm.node.kubernetes.io/openshift-kmm.intel-dgpu.ready"
180+
wait: yes
181+
wait_timeout: 120
182+
register: cluster_nodes_info
183+
until:
184+
- cluster_nodes_info.resources is defined
185+
- name: Print cluster resources
186+
debug:
187+
msg:
188+
- "Please verify Capacity and Allocatable Intel Data Center GPU Resources on the node - "
189+
- "Capacity: {
190+
'gpu.intel.com/i915': {{ cluster_nodes_info.resources[0].status.capacity['gpu.intel.com/i915'] }},"
191+
- "Allocatable Resources: {
192+
'gpu.intel.com/i915': {{ cluster_nodes_info.resources[0].status.allocatable['gpu.intel.com/i915'] }},"

0 commit comments

Comments
 (0)