Skip to content

mountpoint pods are evicted and no new mountpoint pods are created #575

@jblee-muhayu

Description

@jblee-muhayu

/kind bug

NOTE: If this is a filesystem related bug, please take a look at the Mountpoint repo to submit a bug report

What happened?
After a mountpoint pod is evicted due to limit exceed of local-cahe, a new mountpoint does not created
pod can not access the mounted s3 bucket

(Note It's not a shortage of Kubernetes resources.)

The new mountpoint pod itself is not being deployed.

Image

csi-controller log

{"level":"info","ts":"2025-09-04T05:40:25Z","logger":"controller-runtime.metrics","msg":"Starting metrics server"}
{"level":"info","ts":"2025-09-04T05:40:25Z","logger":"controller-runtime.metrics","msg":"Serving metrics server","bindAddress":":8080","secure":false}
{"level":"info","ts":"2025-09-04T05:40:25Z","msg":"Starting EventSource","controller":"aws-s3-csi-controller","controllerGroup":"","controllerKind":"Pod","source":"kind source: *v1.Pod"}
{"level":"info","ts":"2025-09-04T05:40:25Z","msg":"Starting Controller","controller":"aws-s3-csi-controller","controllerGroup":"","controllerKind":"Pod"}
{"level":"info","ts":"2025-09-04T05:40:25Z","msg":"Starting workers","controller":"aws-s3-csi-controller","controllerGroup":"","controllerKind":"Pod","worker count":1}
{"level":"info","ts":"2025-09-04T05:40:25Z","msg":"Pod not found - ignoring","controller":"aws-s3-csi-controller","controllerGroup":"","controllerKind":"Pod","Pod":{"name":"s3-csi-node-24slm","namespace":"kube-system"},"namespace":"kube-system","name":"s3-csi-node-24slm","reconcileID":"45d3bb08-25e2-499e-b5d8-5725557e66e7","pod":{"name":"s3-csi-node-24slm","namespace":"kube-system"}}
{"level":"info","ts":"2025-09-04T05:40:25Z","msg":"Pod not found - ignoring","controller":"aws-s3-csi-controller","controllerGroup":"","controllerKind":"Pod","Pod":{"name":"s3-csi-controller-84d897fd95-2btxf","namespace":"kube-system"},"namespace":"kube-system","name":"s3-csi-controller-84d897fd95-2btxf","reconcileID":"d5e451ef-6527-4501-9801-64a34bfb892c","pod":{"name":"s3-csi-controller-84d897fd95-2btxf","namespace":"kube-system"}}
{"level":"info","ts":"2025-09-04T05:40:25Z","msg":"MountpointS3PodAttachment already has this workload UID","controller":"aws-s3-csi-controller","controllerGroup":"","controllerKind":"Pod","Pod":{"name":"s3-test-deployment-6b4f4dcfc5-qklx4","namespace":"default"},"namespace":"default","name":"s3-test-deployment-6b4f4dcfc5-qklx4","reconcileID":"e40ab8b7-c9e7-472c-aa41-77f93de65fd3","workloadPod":{"name":"s3-test-deployment-6b4f4dcfc5-qklx4","namespace":"default"},"pvc":"s3-pvc","workloadUID":"dc96f449-7091-4b8b-b955-7496cebdaa98","s3pa":"s3pa-2p4hz","spec.persistentVolumeName":"s3-pv","spec.workloadFSGroup":"","spec.authenticationSource":"pod","spec.workloadServiceAccountName":"s3-test-sa","spec.volumeID":"s3-csi-driver-volume","spec.mountOptions":"region ap-northeast-2,allow-delete,allow-overwrite","spec.workloadNamespace":"default","spec.workloadServiceAccountIAMRoleARN":"","spec.nodeName":"ip-10-52-100-94.ap-northeast-2.compute.internal"}
{"level":"info","ts":"2025-09-04T05:43:58Z","msg":"Pod failed","controller":"aws-s3-csi-controller","controllerGroup":"","controllerKind":"Pod","Pod":{"name":"mp-pgmm7","namespace":"mount-s3"},"namespace":"mount-s3","name":"mp-pgmm7","reconcileID":"490a13be-0dfe-48d0-8df7-4a4fb2f1e16e","mountpointPod":"mp-pgmm7","reason":"Evicted"}
{"level":"info","ts":"2025-09-04T05:43:59Z","msg":"Pod failed","controller":"aws-s3-csi-controller","controllerGroup":"","controllerKind":"Pod","Pod":{"name":"mp-pgmm7","namespace":"mount-s3"},"namespace":"mount-s3","name":"mp-pgmm7","reconcileID":"779759b7-1b46-482e-959a-ba7652fc8468","mountpointPod":"mp-pgmm7","reason":"Evicted"}

csi-driver log

I0904 05:46:56.791275       1 node.go:209] NodeGetCapabilities: called with args 
I0904 05:46:56.793113       1 node.go:82] NodePublishVolume: new request: volume_id:"s3-csi-driver-volume" target_path:"/var/lib/kubelet/pods/dc96f449-7091-4b8b-b955-7496cebdaa98/volumes/kubernetes.io~csi/s3-pv/mount" volume_capability:<mount:<mount_flags:"region ap-northeast-2" mount_flags:"allow-delete" mount_flags:"allow-overwrite" > access_mode:<mode:MULTI_NODE_MULTI_WRITER > > volume_context:<key:"authenticationSource" value:"pod" > volume_context:<key:"bucketName" value:"prismd-s3-csi-test" > volume_context:<key:"cache" value:"emptyDir" > volume_context:<key:"cacheEmptyDirMedium" value:"" > volume_context:<key:"cacheEmptyDirSizeLimit" value:"1Gi" > volume_context:<key:"csi.storage.k8s.io/ephemeral" value:"false" > volume_context:<key:"csi.storage.k8s.io/pod.name" value:"s3-test-deployment-6b4f4dcfc5-qklx4" > volume_context:<key:"csi.storage.k8s.io/pod.namespace" value:"default" > volume_context:<key:"csi.storage.k8s.io/pod.uid" value:"dc96f449-7091-4b8b-b955-7496cebdaa98" > volume_context:<key:"csi.storage.k8s.io/serviceAccount.name" value:"s3-test-sa" > volume_context:<key:"mountpointContainerResourcesLimitsCpu" value:"500m" > volume_context:<key:"mountpointContainerResourcesLimitsMemory" value:"1Gi" > volume_context:<key:"mountpointContainerResourcesRequestsCpu" value:"500m" > volume_context:<key:"mountpointContainerResourcesRequestsMemory" value:"1Gi" > volume_context:<key:"stsRegion" value:"ap-northeast-2" > 
I0904 05:46:56.793225       1 node.go:148] NodePublishVolume: mounting prismd-s3-csi-test at /var/lib/kubelet/pods/dc96f449-7091-4b8b-b955-7496cebdaa98/volumes/kubernetes.io~csi/s3-pv/mount with options [--allow-delete --allow-overwrite --allow-root --region=ap-northeast-2]
E0904 05:47:11.794046       1 pod_mounter.go:138] Failed to wait for Mountpoint Pod "mp-pgmm7" to be ready for "/var/lib/kubelet/pods/dc96f449-7091-4b8b-b955-7496cebdaa98/volumes/kubernetes.io~csi/s3-pv/mount": mppod/watcher: mountpoint pod not ready. Seems like Mountpoint Pod is not in 'Running' status. You can see it's status and any potential failures by running: `kubectl describe pods -n mount-s3 mp-pgmm7`
E0904 05:47:11.794195       1 driver.go:170] GRPC error: rpc error: code = Internal desc = Could not mount "prismd-s3-csi-test" at "/var/lib/kubelet/pods/dc96f449-7091-4b8b-b955-7496cebdaa98/volumes/kubernetes.io~csi/s3-pv/mount": Failed to wait for Mountpoint Pod "mp-pgmm7" to be ready: mppod/watcher: mountpoint pod not ready. Seems like Mountpoint Pod is not in 'Running' status. You can see it's status and any potential failures by running: `kubectl describe pods -n mount-s3 mp-pgmm7`
I0904 05:47:16.194803       1 node.go:209] NodeGetCapabilities: called with args
I0904 05:47:25.866190       1 reflector.go:389] pkg/mod/k8s.io/client-go@v0.31.3/tools/cache/reflector.go:243: forcing resync

node mount status

# mount | grep mountpoint-s3
mountpoint-s3 on /var/lib/kubelet/plugins/s3.csi.aws.com/mnt/mp-pgmm7 type fuse (rw,nosuid,nodev,noatime,user_id=0,group_id=0,default_permissions,allow_other)

What you expected to happen?
When a mountpoint pod is evicted, the csi controller creates a new mountpoint pod
The pod can access the mounted s3 bucket through the new mountpoint pod

How to reproduce it (as minimally and precisely as possible)?

Anything else we need to know?:
I don't know what information is needed to track the issue
If you tell me, I will provide the necessary information.
I need your help

Environment

  • Kubernetes version (use kubectl version): EKS v1.30
  • Driver version: v2.0.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions