Pods with fsGroup can't recover after NFS server restart

**What happened**:
After the NFS server restarts, pods can no longer access the mounted filesystem. When the pods are restarted, they fail to start with the following error:

> MountVolume.SetUp failed for volume "pvc-a8667b20-fb7f-4873-aa9d-18ec6f7fdccc" : applyFSGroup failed for vol 192.168.50.210#mnt/user/nfs-csi#pvc-a8667b20-fb7f-4873-aa9d-18ec6f7fdccc##: lstat /var/lib/kubelet/pods/9d3ce1f4-d0c8-4b5e-9994-a17e3267c57d/volumes/kubernetes.io~csi/pvc-a8667b20-fb7f-4873-aa9d-18ec6f7fdccc/mount: stale NFS file handle

**What you expected to happen**:
I expect the CSI driver to detect a stale NFS file handle and remount the NFS share, allowing the pod to recover normally when using fsGroup.

**How to reproduce it**:
1. Create a pod with an fsGroup and fsGroupChangePolicy: "OnRootMismatch", mounting NFS storage provisioned via the CSI driver.
2. Restart the NFS server.
3. Observe that the pod's volume becomes inaccessible (expected).
4. Manually delete/restart the pod.
5. Observe that the pod fails to start with the above error (applyFSGroup failed with stale NFS file handle).

**Anything else we need to know?**:
The CSI driver appears unable to handle stale NFS file handles specifically during the applyFSGroup() operation.

If the pod is restarted without an fsGroup, it starts successfully and the volume is remounted. After this, other pods with fsGroup on the same node also start working again. It seems the stale mount issue is resolved by a remount, but only triggered when fsGroup logic is skipped.

**Environment**:
- CSI Driver version: v4.11.0
- Kubernetes version: v1.33
- OS: Talos v1.10
- Kernel: (can't check at the moment, default from Talos v1.10 — can update if relevant)
- Install tools: Helm chart


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Pods with fsGroup can't recover after NFS server restart #927

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Pods with fsGroup can't recover after NFS server restart #927

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions