-
Notifications
You must be signed in to change notification settings - Fork 580
Description
Describe the bug
After we upgraded the K8S cluster from 1.30.4 to 1.31.4, ceph-rbdplugin cannot mount the image anymore. It still works fine on node that has kubelet of version 1.30.4.
In the beginning, we have ceph-csi v3.12.1. The error occured, so we try upgrading to v3.13.0 to see if it can fix the issue, but it's still the same.
Environment details
- Image/version of Ceph CSI driver : v3.13.0
- Helm chart version :
- Kernel version : RHEL9 5.14.0-503.19.1.el9_5.x86_64
- Mounter used for mounting PVC (for cephFS its
fuse
orkernel
. for rbd its
krbd
orrbd-nbd
) : krbd - Kubernetes cluster version : v1.31.4
- Ceph cluster version : v18.2.4
Steps to reproduce
Steps to reproduce the behavior:
- Setup details
Storage class:
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
creationTimestamp: "2021-04-01T13:37:08Z"
name: dynamic-ceph-storage
resourceVersion: "177756013"
uid: a533c5dc-402c-4ad4-9a81-c543accbd954
mountOptions:
- nodelalloc
parameters:
clusterID: --masked--
csi.storage.k8s.io/controller-expand-secret-name: ceph-user-secret
csi.storage.k8s.io/controller-expand-secret-namespace: access-control
csi.storage.k8s.io/fstype: ext4
csi.storage.k8s.io/node-stage-secret-name: ceph-user-secret
csi.storage.k8s.io/node-stage-secret-namespace: access-control
csi.storage.k8s.io/provisioner-secret-name: ceph-user-secret
csi.storage.k8s.io/provisioner-secret-namespace: access-control
imageFeatures: layering
pool: k8s-sharedpool
provisioner: rbd.csi.ceph.com
reclaimPolicy: Delete
volumeBindingMode: Immediate
User permission:
[client.kube]
key = --masked--
caps mon = "allow r"
caps osd = "allow class-read object_prefix rbd_children, allow rwx pool=k8s-sharedpool"
We also try with the new capabilities docs but it has no help
[client.newkube]
key = --masked--
caps mgr = "profile rbd pool=k8s-sharedpool"
caps mon = "profile rbd"
caps osd = "profile rbd pool=k8s-sharedpool"
- Deployment to trigger the issue '....'
- See error
Pod stuck in Init stage and reported error:
Normal Scheduled 95s default-scheduler Successfully assigned logging-system/aap-es-data-1 to defr4app510
Warning FailedAttachVolume 95s attachdetach-controller Multi-Attach error for volume "pvc-0091ed72-b8d3-4642-9c65-cb45ddfc328e" Volume is already exclusively attached to one node
Normal SuccessfulAttachVolume 85s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-0091ed72-b8d3-4642-9c65-cb45ddfc328e"
Warning FailedMount 18s (x8 over 84s) kubelet MountVolume.MountDevice failed for volume "pvc-0091ed72-b8d3-4642-9c65-cb45ddfc328e" : rpc error: code = Internal desc = exi
Actual results
Node can map the block device but cannot mount it. From the logs, I think the driver try to grep info of the block device using blkid command but not success. Everything works fine when we have kubelet v1.30.
Expected behavior
Node can map and mount the block device to provide to the pods.
Logs
If the issue is in PVC mounting please attach complete logs of below containers.
- csi-rbdplugin/csi-cephfsplugin and driver-registrar container logs from
plugin pod from the node where the mount is failing.
I0109 09:44:29.224808 941792 nodeserver.go:422] ID: 327 Req-ID: 0001-0024-4d3a09c7-d8d2-4927-91cd-08ca6601d0b2-0000000000000007-fe3ca7ee-580e-11ec-b976-a289cdd026fa rbd image: k8s-sharedpool/csi-vol-fe3ca7ee-580e-11ec-b976-a289cdd026fa was successfully mapped at /dev/rbd0
I0109 09:44:29.224926 941792 mount_linux.go:577] Attempting to determine if disk "/dev/rbd0" is formatted using blkid with args: ([-p -s TYPE -s PTTYPE -o export /dev/rbd0])
I0109 09:44:29.227079 941792 mount_linux.go:580] Output: "blkid: error: /dev/rbd0: Operation not permitted\n"
E0109 09:44:29.229984 941792 nodeserver.go:825] ID: 327 Req-ID: 0001-0024-4d3a09c7-d8d2-4927-91cd-08ca6601d0b2-0000000000000007-fe3ca7ee-580e-11ec-b976-a289cdd026fa failed to run mkfs.ext4 ([-m0 -Enodiscard,lazy_itable_init=1,lazy_journal_init=1 /dev/rbd0]) error: exit status 1, output: mke2fs 1.46.5 (30-Dec-2021)
mkfs.ext4: Operation not permitted while trying to determine filesystem size
I0109 09:44:29.311555 941792 cephcmds.go:105] ID: 327 Req-ID: 0001-0024-4d3a09c7-d8d2-4927-91cd-08ca6601d0b2-0000000000000007-fe3ca7ee-580e-11ec-b976-a289cdd026fa command succeeded: rbd [unmap /dev/rbd0 --device-type krbd --options noudev]
E0109 09:44:29.311786 941792 utils.go:245] ID: 327 Req-ID: 0001-0024-4d3a09c7-d8d2-4927-91cd-08ca6601d0b2-0000000000000007-fe3ca7ee-580e-11ec-b976-a289cdd026fa GRPC error: rpc error: code = Internal desc = exit status 1
Note:- If its a rbd issue please provide only rbd related logs, if its a
cephFS issue please provide cephFS logs.