- 
                Notifications
    You must be signed in to change notification settings 
- Fork 13
Description
Hello,
We ended-up in a really edgy problem with our https://github.com/cert-manager/csi-driver-spiffe.
Basically, unlike most other Kubernetes CSI drivers, it uses a first tmpfs that will contains both metadata for volumes and SPIFFE SVID certificates (x509 certificates issued by cert-manager). Then, it answers to NodePublishVolume CSI RPC requests by making a mount with bind + ro from this tmpfs to the target_path given by the Container Orchestrator (in our case Kube).
The issue is that, on restart or failure, the bind no longer point to the valid tmpfs directory, we end up with a never renewed certificates from the POV of pods. One could say, it's fine if the instance of csi-driver-spiffe is never rebooted, you never have to care about that. But the issue is a bit more complex than that!
Since this driver uses RequiresRepublish, kubelet will issue A LOT of repeated calls to NodePublishVolume and any failure will lead to an unmount, but this will not be propagated to the mount_namespace of the running pods!. Hence, requiring users to restart the pod manually.
This is a real problem when used with SPIFFE SVID since they are typically short-lived and rotated often. A failure in this critical rotation mechanism leads to major workload disruptions.
I am revamping parts of this library in a fork to:
- Split metadata and data: metadata ends up in a local tmpfsand data gets its owntmpfsdirectly mounted attarget_pathwhere we will perform atomic updates. No more mount binds so it is simpler to manage.
- If the target_pathis already correctly mounted, avoid unmount or doing any destructive changes since the running pods wouldn't see that.
- Improve the IsMounteddetection to ensure it's not just any mount point but one as expected by this library.
- Add some metrics / stats collection.
- Change error reporting in CSI RPC since they have side-effects, please see: Kubelet deletes CSI mount points if a "requiresrepublish" call to NodePublishVolume returns an error kubernetes/kubernetes#121271 (comment)
Please, tell me if any of those changes seems too risky / if we have a good reason to have a mount bind instead of two separate mounts etc..
Also, tell me if you are interested in getting that merged upstream :)
Thanks 🙏