Skip to content

Hard rebooting Kubernetes nodes leads to "volume already mounted at more than one place" #153

@keskival

Description

@keskival

NFS provisioner binds to a persistent volume claim in the back in the ReadWriteOnce mode.
This is otherwise all well and good, but in a hard reboot of a node, starting these NFS volume pods fails as they fail to get the volume mount.

Specifically with these events:

Warning  FailedMount  93s (x25 over 37m)  kubelet  MountVolume.MountDevice failed for volume "pvc-542bf63c-575a-4a82-ab4d-96d319e58179" : rpc error: code = FailedPrecondition desc = volume {pvc-542bf63c-575a-4a82-ab4d-96d319e58179} is already mounted at more than one place: {{/var/snap/microk8s/common/var/lib/kubelet/plugins/kubernetes.io/csi/jiva.csi.openebs.io/0f4d4b7188975f990ed572ae7bdb4f2f1c07aa967d6460d2a8472343e7c110e1/globalmount  ext4 /dev/disk/by-path/ip-10.152.183.138:3260-iscsi-iqn.2016-09.com.openebs.jiva:pvc-542bf63c-575a-4a82-ab4d-96d319e58179-lun-0}}

At least in microk8s I have found no way to find out what is mounting the volume behind the scenes exactly, or maybe the accounting is simply wrong. I suppose some weird ghost container could in principle be the one keeping the volume reserved, but I haven't managed to find out what and how.

What I have tried:

  • Going through pods and pvcs to make sure nothing else is binding that volume.
  • Going through node mounts. Nothing special there.

Steps to reproduce the bug:
Have several NFS persistent volume claims which use ReadWriteOnce volumes behind them active and reboot a Kubernetes node.
Expected:

  • The pods restart without problems.
    What happens:
  • The pods get stuck as Kubernetes is convinced something is reserving the mounts.

I have no clue how to investigate further and due to manual surgery to try to make the cluster up and running again after this problem, the whole cluster is now in a state of no return and I need to rebuild it from scratch.

Environment details:

  • OpenEBS version: openebs.io/version=3.3.0
  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.5", GitCommit:"804d6167111f6858541cef440ccc53887fbbc96a", GitTreeState:"clean", BuildDate:"2022-12-19T15:26:36Z", GoVersion:"go1.19.4", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.5", GitCommit:"804d6167111f6858541cef440ccc53887fbbc96a", GitTreeState:"clean", BuildDate:"2022-12-19T15:27:17Z", GoVersion:"go1.19.4", Compiler:"gc", Platform:"linux/amd64"}

I'm not sure if this is a NFS Provisioner bug, OpenEBS Jiva bug or MicroK8S bug.

This happens to me about weekly, if anyone has suggestions on how to debug what happens, I'd be glad to hear such.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions