Skip to content

FUSE Sidecar Doesn't Handle GCSFuse Process Crash #534

@tgoodsell-tempus

Description

@tgoodsell-tempus

Context:

We're a GKE Standard Cluster running 1.31.6-gke.1020000 using the native FUSE feature.

We've experienced an issue where some sort of activity causes the FUSE process to crash, specifically the sidecar container (pasted from GCP Log Explorer sorry):

gke-gcsfuse-sidecar
fuse: *fuseops.LookUpInodeOp error: input/output error

gke-gcsfuse-sidecar
fatal error: sync: unlock of unlocked mutex

gke-gcsfuse-sidecar
sync.fatal({0x17c3c28?, 0xc0008cb7e0?}) (for brevity, long stacktrace of the goroutine crashing)

gke-gcsfuse-sidecar
gcsfuse exited with error: exit status 2

However the sidecar init container never exists or indicates it has its own problem. Therefore, the natural GKE reaction of trying to reboot our main container which is mounting the shared volume (which is failing) never recovers the Pod on it's own.

We see this error on our Pod/main container as:

Error: failed to generate container "3a947804065a11c2d9337520ea746f24566ad1ee7c372fc320586393ef7a4dd6" spec: failed to generate spec: failed to stat "/var/lib/kubelet/pods/4383ae58-bd17-4191-aca6-05a4f75b9872/volumes/kubernetes.io~csi/gcs-fuse-csi-ephemeral/mount": stat /var/lib/kubelet/pods/4383ae58-bd17-4191-aca6-05a4f75b9872/volumes/kubernetes.io~csi/gcs-fuse-csi-ephemeral/mount: transport endpoint is not connected

Our sidecar details from GKE:

Image: gke.gcr.io/gcs-fuse-csi-driver-sidecar-mounter:v1.8.3-gke.2@sha256:07a5a7b18b083c47031c540e1664eb0c777a50e523dde030d8b0effdc9bb8761
Command Args: --v=5
Env Vars: NATIVE_SIDECAR=TRUE

My own analysis is this is a bug with the sidecar container, which should have a way to "self-recover" from FUSE process crashes, have a liveness check based on the health of that process, or just fatally crash itself if the FUSE process crashes.

We were able to recover by a deployment rollout restart, so I gather this was triggered by some transient GCS or GKE problem.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions