|
1 |
| -# Disclaimer |
2 |
| -This project is currently under active development, and as such, all source code may not be included in any release. This means that the code is subject to change without notice, and that any information contained within the code should be considered as work in progress. |
3 |
| - |
4 | 1 | # Intel® Technology Enabling for OpenShift*
|
5 |
| -## General |
6 |
| - |
7 |
| -Intel® Technology Enabling for OpenShift* project provides Intel data center and edge hardware features provisioning, related E2E solutions and the reference workloads for these Intel features on Red Hat OpenShift platform. |
8 |
| - |
9 |
| -The goal of the project is to develop and release open, easy-to-use, integrated, and secure solutions to provision and manage Intel technologies on Red Hat OpenShift Container Platform (OCP). These Intel technologies mainly include Intel data center and edge hardware features and related software stacks for E2E solutions and reference workloads. |
10 |
| - |
11 |
| -To achieve this goal, following OCP software stack development and management life cycle should be followed: |
12 |
| - |
13 |
| -* The related software stacks must be designed and implemented properly for Cloud Native and OCP environment from Day 0 |
14 |
| -* The solutions and instructions must allow users to configure and provision their OCP cluster properly with technologies from Intel and other vendors on Day 1 |
15 |
| -* The solutions and instructions to provision Intel hardware features and manage the life cycle of containerized software stack must be present in Day 2 |
16 |
| - |
17 |
| -## Intel Hardware features Provisioning for OpenShift |
18 |
| - |
19 |
| -To provision Intel hardware features on OCP, following open-source projects are used: |
20 |
| -* **[Node Feature Discovery (NFD)](https://github.com/kubernetes-sigs/node-feature-discovery), [NFD Operator](https://github.com/openshift/cluster-nfd-operator)** are used to automatically label the nodes for Hardware provisioning operation. |
21 |
| -* **[Machine Config Operator (MCO)](https://github.com/openshift/machine-config-operator)** is used to configure the Red Hat Enterprise Linux Core OS (RHCOS) on the nodes. |
22 |
| -* **[Kernel Module Management (KMM)](https://github.com/kubernetes-sigs/kernel-module-management), [KMM Operator](https://github.com/rh-ecosystem-edge/kernel-module-management)** are used to manage deployment and lifecycle of Intel Data Center GPU Driver. |
23 |
| -* **[Intel Data Center GPU Driver For OpenShift](https://github.com/intel/intel-data-center-gpu-driver-for-openshift)** use **[Intel GPU Drivers](https://github.com/intel-gpu)** build, package, certify and release Intel Data Center GPU driver container images for OCP. |
24 |
| -* **[Intel Device Plugins for Kubernetes project](https://github.com/intel/intel-device-plugins-for-kubernetes)** Provides Intel GPU/SGX/QAT device plugins images and the operator to deploy and manage the life cycle of these device plugins |
| 2 | +# Overview |
| 3 | +The Intel Technology Enabling for OpenShift project provides Intel Data Center hardware feature-provisioning technologies with the [Red Hat OpenShift Container Platform (RHOCP)](https://www.redhat.com/en/technologies/cloud-computing/openshift/container-platform). The technology to deploy and manage the [End-to-End (E2E)](/e2e) solutions as well as the related reference workloads for these features are also included in the project. |
25 | 4 |
|
26 |
| -Intel hardware features include: |
| 5 | +These Intel Data Center hardware features currently include: |
| 6 | +- Intel® Software Guard Extensions (Intel® SGX) |
| 7 | +- Intel® Data Center GPU Flex Series |
27 | 8 |
|
28 |
| -* **Intel® Software Guard Extensions (Intel® SGX)** |
29 |
| -* **Intel® Data Center GPU Flex Series** |
30 |
| -* **Intel® QuickAssist Technology (Intel® QAT)** |
| 9 | +The following features will be included in future releases. |
| 10 | +- Intel® QuickAssist Technology (Intel® QAT) |
| 11 | +- Intel® Data Center GPU Max Series |
| 12 | +- Intel® Data Streaming Accelerator (Intel® DSA) |
| 13 | +- Intel® In-Memory Analytics Accelerator (Intel® IAA) |
| 14 | +- Intel® FPGA N6000 |
31 | 15 |
|
32 |
| -Below features are under consideration to be included in the future releases: |
| 16 | +See details about [Supported Intel Hardware features](). |
33 | 17 |
|
34 |
| -* Intel® Data Center GPU Max Series |
35 |
| -* Intel® Data Streaming Accelerator (Intel® DSA) |
36 |
| -* Intel® In-Memory Analytics Accelerator (Intel® IAA) |
37 |
| -* Intel® Dynamic Load Balancer (Intel® DLB) |
| 18 | +Figure-1 is the [Architecture and Working Scope]() of the project |
38 | 19 |
|
39 |
| -## [Intel AI Inference E2E Solution for OpenShift](e2e/inference/README.md) |
| 20 | +[add image filename] |
40 | 21 |
|
41 |
| -## Hardware Requirements |
| 22 | +Figure-1 Intel Technology Enabling for OpenShift Architecture |
42 | 23 |
|
43 |
| -### Intel® SGX supported platform |
| 24 | +# Supported platforms |
44 | 25 |
|
45 |
| -* Third Generation Intel® Xeon® Scalable Processors (or later version) are used by the cluster. |
46 |
| -* Contact your server or BIOS vendor for the BIOS setting to enable the feature. |
| 26 | +This [section]() describes the RHOCP infrastructure and Intel hardware features supported by this project. The project lifecycle and support channels can also be found [here](). |
47 | 27 |
|
48 |
| -### Intel® Data Center GPU Card supported platform |
| 28 | +# Getting started |
| 29 | +## Provisioning RHOCP cluster |
| 30 | +Use one of these two options to provision an RHOCP cluster: |
| 31 | +- Use the methods introduced in [RHOCP documentation](https://docs.openshift.com/container-platform/4.12/installing/index.html). |
| 32 | +- Use [Distributed CI](https://doc.distributed-ci.io/) as we do in this project. |
49 | 33 |
|
50 |
| -* The Intel® Data Center GPU Flex Series 140 or Intel® Data Center GPU Flex Series 170 Card is enabled on the nodes. |
51 |
| -* Contact your server or BIOS vendor for the BIOS setting to enable the cards. |
| 34 | +In this project, we provisioned RHOCP 4.12 on a bare-metal multi-node cluster. For details about the supported RHOCP infrastructure, see the [Supported Platforms]() page. |
52 | 35 |
|
53 |
| -### Intel® QAT supported platform |
54 |
| -* 4th Gen Intel® Xeon® Scalable Processors (or later versions) are used by the cluster. |
55 |
| -* Contact your server or BIOS vendor for the BIOS setting to enable the feature. |
| 36 | +## Provisioning Intel hardware features on RHOCP |
| 37 | +Please follow the steps below to provision the hardware features |
| 38 | +1. Setting up [Node Feature Discovery](/nfd/README.md) |
| 39 | +2. Setting up [Machine Configuration](/machine_configuration/README.md) |
| 40 | +3. Setting up [Out of Tree Drivers](/kmmo/README.md) |
| 41 | +4. Setting up [Device Plugins](/device_plugins/README.md) |
56 | 42 |
|
57 |
| -## Get Started |
58 |
| -To properly provision Intel hardware features, deploy and manage the related E2E solutions as well as the reference workload, below OCP software stack development and life cycle management flow is followed by this project |
| 43 | +## Verifying hardware feature provisioning |
| 44 | +You can use the instructions in the [link]() to verify the hardware features provisioning. |
59 | 45 |
|
60 |
| -### Day 0 - Define the requirements of the OCP platform and design it. |
61 |
| -Red Hat [OpenShift Operator](https://www.redhat.com/en/technologies/cloud-computing/openshift/what-are-openshift-operators) automates the creation, configuration, and management of instances of Kubernetes-native applications. It is based on [Kubernetes operator pattern](https://kubernetes.io/docs/concepts/extend-kubernetes/operator/). So the software stack running on OCP needs to be containerized and managed by a specific Operator. As an essential part of OCP, operators need to be well-designed from Day 0. Good examples are [Intel Device Plugins Operator](https://github.com/intel/intel-device-plugins-for-kubernetes) and [KMM Operator](https://github.com/rh-ecosystem-edge/kernel-module-management). |
| 46 | +# Upgrade (To be added) |
62 | 47 |
|
63 |
| -### Day 1 - Provision the OCP platform and configure it to a working state. |
64 |
| -This project mainly focuses on bare metal OCP cluster. [Distributed CI](https://doc.distributed-ci.io/dci-openshift-agent/) is used to provision the bare metal OCP cluster. Users can also refer to [bare metal OCP cluster installation instructions](https://docs.openshift.com/container-platform/4.12/installing/installing_bare_metal_ipi/ipi-install-overview.html) to install the bare metal OCP cluster. |
| 48 | +# Reference end-to-end solution |
| 49 | +The reference end-to-end solution is based on Intel hardware feature provisioning provided by this project. |
65 | 50 |
|
66 |
| -To avoid rebooting the nodes and some other issues on Day 2, Some machine configurations operations can be enforced on day 1 when provisioning the cluster. The related discussion is ongoing. |
| 51 | +[Intel AI Inferencing Solution](/e2e/inference/README.md) |
67 | 52 |
|
68 |
| -### Day 2.0 - Zero O’clock of Day 2 |
69 |
| -The Day 2.0 concept is introduced for users to provision Intel hardware features right after provisioning an OCP cluster and before any user workloads are deployed. Refer to the steps below to provision Intel hardware features: |
| 53 | +# Reference workloads |
| 54 | +Here are some reference workloads built on the end-to-end solution and Intel hardware feature provisioning in this project. |
| 55 | +- Large Language Model (To be added) |
| 56 | +- Open Federated Learning (To be added) |
70 | 57 |
|
71 |
| -* **[Deploy Node Feature Discovery on OpenShift](nfd/README.md#steps-to-install-and-configure-nfd-operator-on-ocp-cluster)** |
72 |
| -* **[Setup Machine Configuration on OpenShift](machine_configuration/README.md#general-configuration-for-provisioning-intel-hardware-features)** |
| 58 | +# Advanced Guide |
| 59 | +This section discusses architecture and other technical details that go beyond getting started. |
| 60 | +- [Architecture and Working Scope]() |
73 | 61 |
|
74 |
| -Note: Running the above steps on Day 2.0 is recommended. However, if you want to provision the features above with the existing cluster on day 2, please be advised that some machine configuration operations might trigger the pods to drain and reboot the nodes. Some of the ongoing efforts in the MCO upstream are to set the machine configurations without rebooting. |
75 |
| - |
76 |
| -### Day 2 - OCP platform is installed and ready to begin providing services. |
77 |
| -Multiple operators are used to provision Intel hardware features and deploy, manage the E2E solutions, and reference workloads. |
78 |
| - |
79 |
| -**Provisioning Intel Hardware features on OpenShift** |
80 |
| -* **[Deploy Intel Data Center GPU Driver on OpenShift](kmmo/README.md#managing-intel-dgpu-driver-with-kmm-operator)** |
81 |
| -* **[Deploy Intel Device Plugins on OpenShift](device_plugins/README.md#deploy-intel-device-plugins-on-openshift)** |
82 |
| - |
83 |
| -**Deploy E2E Solution** |
84 |
| -* **[Deploy Intel AI Reference E2E Solution](e2e/inference/README.md#deploy-intel-ai-inference-e2e-solution)** |
| 62 | +# Release Notes |
| 63 | +Check the [link](https://github.com/intel/intel-technology-enabling-for-openshift/releases/) for the Release Notes. |
85 | 64 |
|
86 | 65 | # Contribute
|
87 | 66 | See [CONTRIBUTING](CONTRIBUTING.md) for more information.
|
88 | 67 |
|
| 68 | +# Security |
| 69 | +To report a potential security vulnerability, please refer to [security.md](/security.md) file. |
| 70 | + |
89 | 71 | # License
|
90 | 72 | Distributed under the open source license. See [LICENSE](/LICENSE.txt) for more information.
|
91 | 73 |
|
92 |
| -# Security |
93 |
| -To report a potential security vulnerability, please refer to [security.md](/security.md) file |
94 |
| - |
95 | 74 | # Code of Conduct
|
96 | 75 | Intel has adopted the Contributor Covenant as the Code of Conduct for all of its open source projects. See [CODE_OF_CONDUCT](/CODE_OF_CONDUCT.md) file.
|
0 commit comments