-
-
Notifications
You must be signed in to change notification settings - Fork 17k
Description
Nixpkgs version
- Stable (25.05)
Describe the bug
Enabling the NVIDIA container toolkit: hardware.nvidia-container-toolkit.enable = true
can lead to a boot-time failure of the nvidia-container-toolkit-cdi-generator.service
. The service sometimes runs before the NVIDIA kernel modules are loaded, resulting in the error: failed to generate CDI spec: failed to create device CDI specs: failed to initialize NVML: Driver Not Loaded
.
As a consequence, /var/run/cdi/nvidia.yaml
is not generated, and containers that rely on --device=nvidia.com/gpu=all
fail with: docker: Error response from daemon: CDI device injection failed: unresolvable CDI devices nvidia.com/gpu=all
.
Steps to reproduce
- Enable the NVIDIA container toolkit in NixOS:
hardware.nvidia-container-toolkit.enable = true;
- Reboot the system.
- Check the logs for the CDI generator:
journalctl -u nvidia-container-toolkit-cdi-generator.service -b
- Observe the error about "Driver Not Loaded".
- Verify that
/var/run/cdi/nvidia.yaml
does not exist until the service is manually restarted.
Expected behaviour
The CDI generator service should reliably generate the NVIDIA CDI spec after the driver is loaded, without requiring manual intervention.
Screenshots
No response
Relevant log output
Oct 14 07:22:19 framework systemd[1]: Starting Container Device Interface (CDI) for Nvidia generator...
Oct 14 07:22:21 framework nvidia-cdi-generator[1065]: time="2025-10-14T07:22:21+02:00" level=error msg="failed to generate CDI spec: failed to create device CDI specs: failed to initialize NVML: Driver Not Loaded"
Additional context
No response
System metadata
- system:
"x86_64-linux"
- host os:
Linux 6.17.0, NixOS, 25.05 (Warbler), 25.05.20251002.879bd46
- multi-user?:
yes
- sandbox:
yes
- version:
nix-env (Nix) 2.28.5
- channels(root):
"nixos-23.11, unstable"
- nixpkgs:
/nix/store/hhg7xrkgh6y3w89cx80qczcm9qm5xsv3-source
Notify maintainers
Note for maintainers: Please tag this issue in your pull request description. (i.e. Resolves #ISSUE
.)
I assert that this issue is relevant for Nixpkgs
- I assert that this is a bug and not a support request.
- I assert that this is not a duplicate of an existing issue.
- I assert that I have read the NixOS Code of Conduct and agree to abide by it.
Is this issue important to you?
Add a 👍 reaction to issues you find important.