-
Notifications
You must be signed in to change notification settings - Fork 277
Description
What happened:
After the NFS server restarts, pods can no longer access the mounted filesystem. When the pods are restarted, they fail to start with the following error:
MountVolume.SetUp failed for volume "pvc-a8667b20-fb7f-4873-aa9d-18ec6f7fdccc" : applyFSGroup failed for vol 192.168.50.210#mnt/user/nfs-csi#pvc-a8667b20-fb7f-4873-aa9d-18ec6f7fdccc##: lstat /var/lib/kubelet/pods/9d3ce1f4-d0c8-4b5e-9994-a17e3267c57d/volumes/kubernetes.io~csi/pvc-a8667b20-fb7f-4873-aa9d-18ec6f7fdccc/mount: stale NFS file handle
What you expected to happen:
I expect the CSI driver to detect a stale NFS file handle and remount the NFS share, allowing the pod to recover normally when using fsGroup.
How to reproduce it:
- Create a pod with an fsGroup and fsGroupChangePolicy: "OnRootMismatch", mounting NFS storage provisioned via the CSI driver.
- Restart the NFS server.
- Observe that the pod's volume becomes inaccessible (expected).
- Manually delete/restart the pod.
- Observe that the pod fails to start with the above error (applyFSGroup failed with stale NFS file handle).
Anything else we need to know?:
The CSI driver appears unable to handle stale NFS file handles specifically during the applyFSGroup() operation.
If the pod is restarted without an fsGroup, it starts successfully and the volume is remounted. After this, other pods with fsGroup on the same node also start working again. It seems the stale mount issue is resolved by a remount, but only triggered when fsGroup logic is skipped.
Environment:
- CSI Driver version: v4.11.0
- Kubernetes version: v1.33
- OS: Talos v1.10
- Kernel: (can't check at the moment, default from Talos v1.10 — can update if relevant)
- Install tools: Helm chart