ceph-csi for rbd cannot mount image after upgrade k8s to v1.31

# Describe the bug #

After we upgraded the K8S cluster from 1.30.4 to 1.31.4, ceph-rbdplugin cannot mount the image anymore. It still works fine on node that has kubelet of version 1.30.4.
In the beginning, we have ceph-csi v3.12.1. The error occured, so we try upgrading to v3.13.0 to see if it can fix the issue, but it's still the same.

# Environment details #

- Image/version of Ceph CSI driver : v3.13.0
- Helm chart version :
- Kernel version : RHEL9 5.14.0-503.19.1.el9_5.x86_64
- Mounter used for mounting PVC (for cephFS its `fuse` or `kernel`. for rbd its
  `krbd` or `rbd-nbd`) : krbd
- Kubernetes cluster version : v1.31.4
- Ceph cluster version : v18.2.4

# Steps to reproduce #

Steps to reproduce the behavior:

1. Setup details
Storage class:
```yaml
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  creationTimestamp: "2021-04-01T13:37:08Z"
  name: dynamic-ceph-storage
  resourceVersion: "177756013"
  uid: a533c5dc-402c-4ad4-9a81-c543accbd954
mountOptions:
- nodelalloc
parameters:
  clusterID: --masked--
  csi.storage.k8s.io/controller-expand-secret-name: ceph-user-secret
  csi.storage.k8s.io/controller-expand-secret-namespace: access-control
  csi.storage.k8s.io/fstype: ext4
  csi.storage.k8s.io/node-stage-secret-name: ceph-user-secret
  csi.storage.k8s.io/node-stage-secret-namespace: access-control
  csi.storage.k8s.io/provisioner-secret-name: ceph-user-secret
  csi.storage.k8s.io/provisioner-secret-namespace: access-control
  imageFeatures: layering
  pool: k8s-sharedpool
provisioner: rbd.csi.ceph.com
reclaimPolicy: Delete
volumeBindingMode: Immediate
```
User permission:
```
[client.kube]
        key = --masked--
        caps mon = "allow r"
        caps osd = "allow class-read object_prefix rbd_children, allow rwx pool=k8s-sharedpool"
```
We also try with the new capabilities docs but it has no help
```
[client.newkube]
        key = --masked--
        caps mgr = "profile rbd pool=k8s-sharedpool"
        caps mon = "profile rbd"
        caps osd = "profile rbd pool=k8s-sharedpool"
```
1. Deployment to trigger the issue '....'
1. See error
Pod stuck in Init stage and reported error:
```
 Normal   Scheduled               95s                default-scheduler        Successfully assigned logging-system/aap-es-data-1 to defr4app510
  Warning  FailedAttachVolume      95s                attachdetach-controller  Multi-Attach error for volume "pvc-0091ed72-b8d3-4642-9c65-cb45ddfc328e" Volume is already exclusively attached to one node
  Normal   SuccessfulAttachVolume  85s                attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-0091ed72-b8d3-4642-9c65-cb45ddfc328e"
  Warning  FailedMount             18s (x8 over 84s)  kubelet                  MountVolume.MountDevice failed for volume "pvc-0091ed72-b8d3-4642-9c65-cb45ddfc328e" : rpc error: code = Internal desc = exi
```
# Actual results #

Node can map the block device but cannot mount it. From the logs, I think the driver try to grep info of the block device using blkid command but not success. Everything works fine when we have kubelet v1.30.

# Expected behavior #

Node can map and mount the block device to provide to the pods.

# Logs #

If the issue is in PVC mounting please attach complete logs of below containers.

- csi-rbdplugin/csi-cephfsplugin and driver-registrar container logs from
  plugin pod from the node where the mount is failing.

```
I0109 09:44:29.224808  941792 nodeserver.go:422] ID: 327 Req-ID: 0001-0024-4d3a09c7-d8d2-4927-91cd-08ca6601d0b2-0000000000000007-fe3ca7ee-580e-11ec-b976-a289cdd026fa rbd image: k8s-sharedpool/csi-vol-fe3ca7ee-580e-11ec-b976-a289cdd026fa was successfully mapped at /dev/rbd0
I0109 09:44:29.224926  941792 mount_linux.go:577] Attempting to determine if disk "/dev/rbd0" is formatted using blkid with args: ([-p -s TYPE -s PTTYPE -o export /dev/rbd0])
I0109 09:44:29.227079  941792 mount_linux.go:580] Output: "blkid: error: /dev/rbd0: Operation not permitted\n"
E0109 09:44:29.229984  941792 nodeserver.go:825] ID: 327 Req-ID: 0001-0024-4d3a09c7-d8d2-4927-91cd-08ca6601d0b2-0000000000000007-fe3ca7ee-580e-11ec-b976-a289cdd026fa failed to run mkfs.ext4 ([-m0 -Enodiscard,lazy_itable_init=1,lazy_journal_init=1 /dev/rbd0]) error: exit status 1, output: mke2fs 1.46.5 (30-Dec-2021)
mkfs.ext4: Operation not permitted while trying to determine filesystem size
I0109 09:44:29.311555  941792 cephcmds.go:105] ID: 327 Req-ID: 0001-0024-4d3a09c7-d8d2-4927-91cd-08ca6601d0b2-0000000000000007-fe3ca7ee-580e-11ec-b976-a289cdd026fa command succeeded: rbd [unmap /dev/rbd0 --device-type krbd --options noudev]
E0109 09:44:29.311786  941792 utils.go:245] ID: 327 Req-ID: 0001-0024-4d3a09c7-d8d2-4927-91cd-08ca6601d0b2-0000000000000007-fe3ca7ee-580e-11ec-b976-a289cdd026fa GRPC error: rpc error: code = Internal desc = exit status 1
```

**Note:-** If its a rbd issue please provide only rbd related logs, if its a
cephFS issue please provide cephFS logs.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ceph-csi for rbd cannot mount image after upgrade k8s to v1.31 #5066

Describe the bug

Environment details

Steps to reproduce

Actual results

Expected behavior

Logs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ceph-csi for rbd cannot mount image after upgrade k8s to v1.31 #5066

Description

Describe the bug

Environment details

Steps to reproduce

Actual results

Expected behavior

Logs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions