Skip to content

Pods with fsGroup can't recover after NFS server restart #927

@vanchaxy

Description

@vanchaxy

What happened:
After the NFS server restarts, pods can no longer access the mounted filesystem. When the pods are restarted, they fail to start with the following error:

MountVolume.SetUp failed for volume "pvc-a8667b20-fb7f-4873-aa9d-18ec6f7fdccc" : applyFSGroup failed for vol 192.168.50.210#mnt/user/nfs-csi#pvc-a8667b20-fb7f-4873-aa9d-18ec6f7fdccc##: lstat /var/lib/kubelet/pods/9d3ce1f4-d0c8-4b5e-9994-a17e3267c57d/volumes/kubernetes.io~csi/pvc-a8667b20-fb7f-4873-aa9d-18ec6f7fdccc/mount: stale NFS file handle

What you expected to happen:
I expect the CSI driver to detect a stale NFS file handle and remount the NFS share, allowing the pod to recover normally when using fsGroup.

How to reproduce it:

  1. Create a pod with an fsGroup and fsGroupChangePolicy: "OnRootMismatch", mounting NFS storage provisioned via the CSI driver.
  2. Restart the NFS server.
  3. Observe that the pod's volume becomes inaccessible (expected).
  4. Manually delete/restart the pod.
  5. Observe that the pod fails to start with the above error (applyFSGroup failed with stale NFS file handle).

Anything else we need to know?:
The CSI driver appears unable to handle stale NFS file handles specifically during the applyFSGroup() operation.

If the pod is restarted without an fsGroup, it starts successfully and the volume is remounted. After this, other pods with fsGroup on the same node also start working again. It seems the stale mount issue is resolved by a remount, but only triggered when fsGroup logic is skipped.

Environment:

  • CSI Driver version: v4.11.0
  • Kubernetes version: v1.33
  • OS: Talos v1.10
  • Kernel: (can't check at the moment, default from Talos v1.10 — can update if relevant)
  • Install tools: Helm chart

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions