Hard rebooting Kubernetes nodes leads to "volume already mounted at more than one place"

NFS provisioner binds to a persistent volume claim in the back in the `ReadWriteOnce` mode.
This is otherwise all well and good, but in a hard reboot of a node, starting these NFS volume pods fails as they fail to get the volume mount.

Specifically with these events:
```
Warning  FailedMount  93s (x25 over 37m)  kubelet  MountVolume.MountDevice failed for volume "pvc-542bf63c-575a-4a82-ab4d-96d319e58179" : rpc error: code = FailedPrecondition desc = volume {pvc-542bf63c-575a-4a82-ab4d-96d319e58179} is already mounted at more than one place: {{/var/snap/microk8s/common/var/lib/kubelet/plugins/kubernetes.io/csi/jiva.csi.openebs.io/0f4d4b7188975f990ed572ae7bdb4f2f1c07aa967d6460d2a8472343e7c110e1/globalmount  ext4 /dev/disk/by-path/ip-10.152.183.138:3260-iscsi-iqn.2016-09.com.openebs.jiva:pvc-542bf63c-575a-4a82-ab4d-96d319e58179-lun-0}}
```

At least in microk8s I have found no way to find out what is mounting the volume behind the scenes exactly, or maybe the accounting is simply wrong. I suppose some weird ghost container could in principle be the one keeping the volume reserved, but I haven't managed to find out what and how.

What I have tried:
- Going through pods and pvcs to make sure nothing else is binding that volume.
- Going through node `mount`s. Nothing special there.

**Steps to reproduce the bug:**
Have several NFS persistent volume claims which use ReadWriteOnce volumes behind them active and reboot a Kubernetes node.
Expected:
- The pods restart without problems.
What happens:
- The pods get stuck as Kubernetes is convinced something is reserving the mounts.

I have no clue how to investigate further and due to manual surgery to try to make the cluster up and running again after this problem, the whole cluster is now in a state of no return and I need to rebuild it from scratch.

**Environment details:**
- OpenEBS version: `openebs.io/version=3.3.0`
- Kubernetes version (use `kubectl version`):
```
Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.5", GitCommit:"804d6167111f6858541cef440ccc53887fbbc96a", GitTreeState:"clean", BuildDate:"2022-12-19T15:26:36Z", GoVersion:"go1.19.4", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.5", GitCommit:"804d6167111f6858541cef440ccc53887fbbc96a", GitTreeState:"clean", BuildDate:"2022-12-19T15:27:17Z", GoVersion:"go1.19.4", Compiler:"gc", Platform:"linux/amd64"}
```
- OS: Ubuntu 22.04.1 LTS
- kernel (e.g: `uname -a`): Linux curie 5.15.0-58-generic #64-Ubuntu SMP Thu Jan 5 11:43:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

I'm not sure if this is a NFS Provisioner bug, OpenEBS Jiva bug or MicroK8S bug.

This happens to me about weekly, if anyone has suggestions on how to debug what happens, I'd be glad to hear such.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Hard rebooting Kubernetes nodes leads to "volume already mounted at more than one place" #153

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Hard rebooting Kubernetes nodes leads to "volume already mounted at more than one place" #153

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions