-
Notifications
You must be signed in to change notification settings - Fork 61
Description
NFS provisioner binds to a persistent volume claim in the back in the ReadWriteOnce
mode.
This is otherwise all well and good, but in a hard reboot of a node, starting these NFS volume pods fails as they fail to get the volume mount.
Specifically with these events:
Warning FailedMount 93s (x25 over 37m) kubelet MountVolume.MountDevice failed for volume "pvc-542bf63c-575a-4a82-ab4d-96d319e58179" : rpc error: code = FailedPrecondition desc = volume {pvc-542bf63c-575a-4a82-ab4d-96d319e58179} is already mounted at more than one place: {{/var/snap/microk8s/common/var/lib/kubelet/plugins/kubernetes.io/csi/jiva.csi.openebs.io/0f4d4b7188975f990ed572ae7bdb4f2f1c07aa967d6460d2a8472343e7c110e1/globalmount ext4 /dev/disk/by-path/ip-10.152.183.138:3260-iscsi-iqn.2016-09.com.openebs.jiva:pvc-542bf63c-575a-4a82-ab4d-96d319e58179-lun-0}}
At least in microk8s I have found no way to find out what is mounting the volume behind the scenes exactly, or maybe the accounting is simply wrong. I suppose some weird ghost container could in principle be the one keeping the volume reserved, but I haven't managed to find out what and how.
What I have tried:
- Going through pods and pvcs to make sure nothing else is binding that volume.
- Going through node
mount
s. Nothing special there.
Steps to reproduce the bug:
Have several NFS persistent volume claims which use ReadWriteOnce volumes behind them active and reboot a Kubernetes node.
Expected:
- The pods restart without problems.
What happens: - The pods get stuck as Kubernetes is convinced something is reserving the mounts.
I have no clue how to investigate further and due to manual surgery to try to make the cluster up and running again after this problem, the whole cluster is now in a state of no return and I need to rebuild it from scratch.
Environment details:
- OpenEBS version:
openebs.io/version=3.3.0
- Kubernetes version (use
kubectl version
):
Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.5", GitCommit:"804d6167111f6858541cef440ccc53887fbbc96a", GitTreeState:"clean", BuildDate:"2022-12-19T15:26:36Z", GoVersion:"go1.19.4", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.5", GitCommit:"804d6167111f6858541cef440ccc53887fbbc96a", GitTreeState:"clean", BuildDate:"2022-12-19T15:27:17Z", GoVersion:"go1.19.4", Compiler:"gc", Platform:"linux/amd64"}
- OS: Ubuntu 22.04.1 LTS
- kernel (e.g:
uname -a
): Linux curie 5.15.0-58-generic refactor(metrics): Refactoring metrics package with prometheus metrics naming convention #64-Ubuntu SMP Thu Jan 5 11:43:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
I'm not sure if this is a NFS Provisioner bug, OpenEBS Jiva bug or MicroK8S bug.
This happens to me about weekly, if anyone has suggestions on how to debug what happens, I'd be glad to hear such.