Skip to content

Commit c5e831e

Browse files
committed
Draft Kata update to use kata-deploy
Signed-off-by: Mike McKiernan <mmckiernan@nvidia.com>
1 parent 01292aa commit c5e831e

File tree

1 file changed

+137
-44
lines changed

1 file changed

+137
-44
lines changed

gpu-operator/gpu-operator-kata.rst

Lines changed: 137 additions & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -67,10 +67,10 @@ The following diagram shows the software components that Kubernetes uses to run
6767
a[Kubelet] --> b[CRI] --> c[Kata\nRuntime] --> d[Lightweight\nQEMU VM] --> e[Lightweight\nGuest OS] --> f[Pod] --> g[Container]
6868

6969

70-
NVIDIA supports Kata Containers by using the Confidential Containers Operator to install the Kata runtime and QEMU.
71-
Even though the Operator isn't used for confidential computing in this configuration, the Operator
72-
simplifies the installation of the Kata runtime.
70+
NVIDIA supports Kata Containers by using Helm to run a daemon set that installs the Kata runtime and QEMU.
7371

72+
The daemon set runs the `kata-deploy.sh` script and configures each worker node with a runtime class, ``kata-qemu-nvidia-gpu``,
73+
and configures containerd for the runtime class.
7474

7575
About NVIDIA Kata Manager
7676
=========================
@@ -82,43 +82,42 @@ The manager downloads an NVIDIA optimized Linux kernel image and initial RAM dis
8282
provides the lightweight operating system for the virtual machines that run in QEMU.
8383
These artifacts are downloaded from the NVIDIA container registry, nvcr.io, on each worker node.
8484

85-
The manager also configures each worker node with a runtime class, ``kata-qemu-nvidia-gpu``,
86-
and configures containerd for the runtime class.
85+
.. comment
8786
88-
NVIDIA Kata Manager Configuration
89-
=================================
87+
NVIDIA Kata Manager Configuration
88+
=================================
9089
91-
The following part of the cluster policy shows the fields related to the manager:
90+
The following part of the cluster policy shows the fields related to the manager:
9291
93-
.. code-block:: yaml
92+
.. code-block:: yaml
9493
95-
kataManager:
96-
enabled: true
97-
config:
98-
artifactsDir: /opt/nvidia-gpu-operator/artifacts/runtimeclasses
99-
runtimeClasses:
100-
- artifacts:
101-
pullSecret: ""
102-
url: nvcr.io/nvidia/cloud-native/kata-gpu-artifacts:ubuntu22.04-525
103-
name: kata-qemu-nvidia-gpu
104-
nodeSelector: {}
105-
- artifacts:
106-
pullSecret: ""
107-
url: nvcr.io/nvidia/cloud-native/kata-gpu-artifacts:ubuntu22.04-535-snp
108-
name: kata-qemu-nvidia-gpu-snp
109-
nodeSelector: {}
110-
repository: nvcr.io/nvidia/cloud-native
111-
image: k8s-kata-manager
112-
version: v0.1.0
113-
imagePullPolicy: IfNotPresent
114-
imagePullSecrets: []
115-
env: []
116-
resources: {}
117-
118-
The ``kata-qemu-nvidia-gpu`` runtime class is used with Kata Containers.
119-
120-
The ``kata-qemu-nvidia-gpu-snp`` runtime class is used with Confidential Containers
121-
and is installed by default even though it is not used with this configuration.
94+
kataManager:
95+
enabled: true
96+
config:
97+
artifactsDir: /opt/nvidia-gpu-operator/artifacts/runtimeclasses
98+
runtimeClasses:
99+
- artifacts:
100+
pullSecret: ""
101+
url: nvcr.io/nvidia/cloud-native/kata-gpu-artifacts:ubuntu22.04-525
102+
name: kata-qemu-nvidia-gpu
103+
nodeSelector: {}
104+
- artifacts:
105+
pullSecret: ""
106+
url: nvcr.io/nvidia/cloud-native/kata-gpu-artifacts:ubuntu22.04-535-snp
107+
name: kata-qemu-nvidia-gpu-snp
108+
nodeSelector: {}
109+
repository: nvcr.io/nvidia/cloud-native
110+
image: k8s-kata-manager
111+
version: v0.1.0
112+
imagePullPolicy: IfNotPresent
113+
imagePullSecrets: []
114+
env: []
115+
resources: {}
116+
117+
The ``kata-qemu-nvidia-gpu`` runtime class is used with Kata Containers.
118+
119+
The ``kata-qemu-nvidia-gpu-snp`` runtime class is used with Confidential Containers
120+
and is installed by default even though it is not used with this configuration.
122121
123122
124123
*********************************
@@ -197,7 +196,7 @@ Prerequisites
197196

198197
* Your hosts are configured to support IOMMU.
199198

200-
If the output from running ``ls /sys/kernel/iommu_groups`` includes ``0``, ``1``, and so on,
199+
If the output from running ``ls /sys/kernel/iommu_groups`` includes a value greater than ``0``,
201200
then your host is configured for IOMMU.
202201

203202
If a host is not configured or you are unsure, add the ``intel_iommu=on`` Linux kernel command-line argument.
@@ -228,22 +227,116 @@ Installing and configuring your cluster to support the NVIDIA GPU Operator with
228227
This step ensures that you can continue to run traditional container workloads with GPU or vGPU workloads on some nodes in your cluster.
229228
Alternatively, you can set the default sandbox workload to ``vm-passthrough`` to run confidential containers on all worker nodes.
230229

231-
#. Install the Confidential Containers Operator.
230+
#. Install the Kata Deploy Helm chart.
232231

233-
This step installs the Operator and also the Kata Containers runtime that NVIDIA uses for Kata Containers.
232+
This step runs `kata-deploy.sh` on each node and installs the Kata Containers runtime on each node.
234233

235234
#. Install the NVIDIA GPU Operator.
236235

237236
You install the Operator and specify options to deploy the operands that are required for Kata Containers.
238237

239238
After installation, you can run a sample workload.
240239

241-
.. |project-name| replace:: Kata Containers
240+
*************************************
241+
Kata Deploy Helm Chart Customizations
242+
*************************************
243+
244+
The following table shows the configurable values from the Kata Deploy Helm chart.
245+
246+
.. list-table::
247+
:widths: 20 50 30
248+
:header-rows: 1
249+
250+
* - Parameter
251+
- Description
252+
- Default
253+
254+
* - ``kataDeploy.allowedHypervisorAnnotations``
255+
- Specifies the
256+
`hypervisor annotations <https://github.com/kata-containers/kata-containers/blob/main/docs/how-to/how-to-set-sandbox-config-kata.md#hypervisor-options>`__
257+
to enable in the Kata configuration file on each node.
258+
Specify a space-separated string of values such as ``enable_iommu initrd kernel``.
259+
- None
260+
261+
* - ``kataDeploy.createRuntimeClasses``
262+
- When set to ``true``, the ``kata-deploy.sh`` script installs the runtime classes on the nodes.
263+
- ``true``
264+
265+
* - ``kataDeploy.createDefaultRuntimeClass``
266+
- When set to ``true``, the ``kata-deploy.sh`` script sets the runtime class specified in the ``defaultShim`` field as the default Kata runtime class.
267+
- ``false``
268+
269+
* - ``kataDeploy.debug``
270+
- When set to ``true``, the ``kata-deploy.sh`` script enables debugging and a debug console in the Kata configuration file on each node.
271+
- ``false``
272+
273+
* - ``kataDeploy.defaultShim``
274+
- Specifies the shim to set as the default Kata runtime class.
275+
This field is ignored unless you specify ``createDefaultRuntimeClass: true``.
276+
- ``qemu-nvidia-gpu``
277+
278+
* - ``kataDeploy.imagePullPolicy``
279+
- Specifies the image pull policy for the ``kata-deploy`` container.
280+
- ``Always``
281+
282+
* - ``kataDeploy.k8sDistribution``
283+
- FIXME
284+
- ``k8s``
285+
286+
* - ``kataDeploy.repository``
287+
- Specifies the image repository for the ``kata-deploy`` container.
288+
- ``nvcr.io/nvidia/cloud-native``
289+
290+
* - ``kataDeploy.shims``
291+
- Specifies the shim binaries to install on each node.
292+
Specify a space-separated string of values.
293+
- ``qemu-nvidia-gpu``
294+
295+
* - ``kataDeploy.version``
296+
- Specifies the version of the ``kata-deploy`` container to run.
297+
- ``latest``
298+
299+
300+
**********************************
301+
Install the Kata Deploy Helm Chart
302+
**********************************
303+
304+
Perform the following steps to install the Helm chart:
305+
306+
#. Add and update the NVIDIA Helm repository:
307+
308+
.. code-block:: console
309+
310+
$ helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \
311+
&& helm repo update
312+
313+
#. Specify at least the following options when you install the chart.
314+
315+
.. code-block:: console
316+
317+
$ helm install --wait --generate-name \
318+
-n kube-system \
319+
nvidia/kata-deploy
320+
321+
#. Optional: Verify the installation.
322+
323+
- Confirm the ``kata-deploy`` containers are running:
324+
325+
.. code-block:: console
326+
327+
$ kubectl get pods -n kube-system -l FIXME
328+
329+
- Confirm the runtime class is installed:
330+
331+
.. code-block:: console
332+
333+
$ kubectl get runtimeclass kata-qemu-nvidia-gpu
334+
335+
*Example Output*
242336

243-
.. include:: gpu-operator-confidential-containers.rst
244-
:start-after: start-install-coco-operator
245-
:end-before: end-install-coco-operator
337+
.. code-block:: output
246338
339+
FIXME
247340
248341
*******************************
249342
Install the NVIDIA GPU Operator
@@ -262,7 +355,7 @@ Perform the following steps to install the Operator for use with Kata Containers
262355
&& helm repo update
263356
264357
#. Specify at least the following options when you install the Operator.
265-
If you want to run |project-name| by default on all worker nodes, also specify ``--set sandboxWorkloads.defaultWorkload=vm-passthough``.
358+
If you want to run Kata Containers by default on all worker nodes, also specify ``--set sandboxWorkloads.defaultWorkload=vm-passthough``.
266359

267360
.. code-block:: console
268361

0 commit comments

Comments
 (0)