-
Notifications
You must be signed in to change notification settings - Fork 63
[Feat] Multi region support (Topology Aware Provisioning) #280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 18 commits
a88cfa7
eafca43
eff8321
2c4ad48
e30ba15
92e0a49
6f7f536
95b617e
0fb44bc
576771c
4ffbc3f
a98c894
d5d4237
14752ae
ac526d0
328086d
2b7d27a
200e115
30fd9cb
e24ea9d
0579192
1c31cfd
47706e2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,80 @@ | ||
| ## 🌐 Topology-Aware Provisioning | ||
|
|
||
| This CSI driver supports topology-aware provisioning, optimizing volume placement based on the physical infrastructure layout. | ||
|
|
||
| **Notes:** | ||
|
|
||
| 1. **Volume Cloning**: Cloning only works within the same region, not across regions. | ||
| 2. **Volume Migration**: We can't move volumes across regions. | ||
| 3. **Remote Provisioning**: Volume provisioning is supported in remote regions (nodes or clusters outside of the region where the controller server is deployed). | ||
|
|
||
| > [!IMPORTANT] | ||
| > Make sure you are using the latest release v0.8.6+ to utilize the remote provisioning feature. | ||
| #### 📝 Example StorageClass and PVC | ||
|
|
||
| ```yaml | ||
| allowVolumeExpansion: true | ||
| apiVersion: storage.k8s.io/v1 | ||
| kind: StorageClass | ||
| metadata: | ||
| name: linode-block-storage-topology-aware | ||
| provisioner: linodebs.csi.linode.com | ||
| reclaimPolicy: Delete | ||
| volumeBindingMode: WaitForFirstConsumer | ||
| --- | ||
| apiVersion: v1 | ||
| kind: PersistentVolumeClaim | ||
| metadata: | ||
| name: pvc-filesystem | ||
| spec: | ||
| accessModes: | ||
| - ReadWriteOnce | ||
| resources: | ||
| requests: | ||
| storage: 10Gi | ||
| storageClassName: linode-block-storage-topology-aware | ||
| ``` | ||
| > **Important**: The `volumeBindingMode: WaitForFirstConsumer` setting is crucial for topology-aware provisioning. It delays volume binding and creation until a pod using the PVC is created. This allows the system to consider the pod's scheduling requirements and node assignment when selecting the most appropriate storage location, ensuring optimal data locality and performance. | ||
|
|
||
| #### 🖥️ Example Pod | ||
|
|
||
| ```yaml | ||
| apiVersion: v1 | ||
| kind: Pod | ||
| metadata: | ||
| name: e2e-pod | ||
| spec: | ||
| nodeSelector: | ||
| topology.linode.com/region: us-ord | ||
| tolerations: | ||
| - key: "node-role.kubernetes.io/control-plane" | ||
| operator: "Exists" | ||
| effect: "NoSchedule" | ||
| containers: | ||
| - name: e2e-pod | ||
| image: ubuntu | ||
| command: | ||
| - sleep | ||
| - "1000000" | ||
| volumeMounts: | ||
| - mountPath: /data | ||
| name: csi-volume | ||
| volumes: | ||
| - name: csi-volume | ||
| persistentVolumeClaim: | ||
| claimName: pvc-filesystem | ||
| ``` | ||
|
|
||
| This example demonstrates how to set up topology-aware provisioning using the Linode Block Storage CSI Driver. The StorageClass defines the provisioner and reclaim policy, while the PersistentVolumeClaim requests storage from this class. The Pod specification shows how to use the PVC and includes a node selector for region-specific deployment. | ||
|
|
||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should we also mention that the cluster itself must be started with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah good point. Let me add that as well. Also, that feature flag is going to be turn on by default now with future csi releases |
||
| #### Provisioning Process | ||
|
|
||
| 1. CO determines required topology based on application needs and cluster layout. | ||
|
||
| 2. CO includes `TopologyRequirement` in `CreateVolume` call. | ||
| 3. CSI driver creates volume satisfying topology requirements. | ||
| 4. Driver returns actual topology of created volume. | ||
| 5. CO uses this information to schedule workloads on nodes with matching topology. | ||
|
|
||
| By leveraging topology-aware provisioning, CSI drivers ensure optimal volume placement within the infrastructure, improving performance, availability, and data locality. | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,17 @@ | ||
| apiVersion: storage.k8s.io/v1 | ||
| kind: StorageClass | ||
| metadata: | ||
| name: linode-block-storage-topology-aware-retain | ||
| namespace: {{ required ".Values.namespace required" .Values.namespace }} | ||
| {{- if eq .Values.defaultStorageClass "linode-block-storage-topology-aware-retain" }} | ||
| annotations: | ||
| storageclass.kubernetes.io/is-default-class: "true" | ||
| {{- end }} | ||
| {{- if .Values.volumeTags }} | ||
| parameters: | ||
| linodebs.csi.linode.com/volumeTags: {{ join "," .Values.volumeTags }} | ||
| {{- end}} | ||
| allowVolumeExpansion: true | ||
| provisioner: linodebs.csi.linode.com | ||
| reclaimPolicy: Retain | ||
| volumeBindingMode: WaitForFirstConsumer |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,17 @@ | ||
| apiVersion: storage.k8s.io/v1 | ||
| kind: StorageClass | ||
| metadata: | ||
| name: linode-block-storage-topology-aware | ||
| namespace: {{ required ".Values.namespace required" .Values.namespace }} | ||
| {{- if eq .Values.defaultStorageClass "linode-block-storage-topology-aware" }} | ||
| annotations: | ||
| storageclass.kubernetes.io/is-default-class: "true" | ||
| {{- end }} | ||
| {{- if .Values.volumeTags }} | ||
| parameters: | ||
| linodebs.csi.linode.com/volumeTags: {{ join "," .Values.volumeTags }} | ||
| {{- end}} | ||
| allowVolumeExpansion: true | ||
| provisioner: linodebs.csi.linode.com | ||
| reclaimPolicy: Delete | ||
| volumeBindingMode: WaitForFirstConsumer |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -131,7 +131,7 @@ func (cs *ControllerServer) maxAllowedVolumeAttachments(ctx context.Context, ins | |
|
|
||
| // getContentSourceVolume retrieves information about the Linode volume to clone from. | ||
| // It returns a LinodeVolumeKey if a valid source volume is found, or an error if the source is invalid. | ||
| func (cs *ControllerServer) getContentSourceVolume(ctx context.Context, contentSource *csi.VolumeContentSource) (volKey *linodevolumes.LinodeVolumeKey, err error) { | ||
| func (cs *ControllerServer) getContentSourceVolume(ctx context.Context, contentSource *csi.VolumeContentSource, accessibilityRequirements *csi.TopologyRequirement) (volKey *linodevolumes.LinodeVolumeKey, err error) { | ||
| log := logger.GetLogger(ctx) | ||
| log.V(4).Info("Attempting to get content source volume") | ||
|
|
||
|
|
@@ -167,19 +167,27 @@ func (cs *ControllerServer) getContentSourceVolume(ctx context.Context, contentS | |
| return nil, errInternal("source volume *linodego.Volume is nil") // Throw an internal error if the processed linodego.Volume is nil | ||
| } | ||
|
|
||
| // Check if the volume's region matches the server's metadata region | ||
| if volumeData.Region != cs.metadata.Region { | ||
| // Check if the source volume's region matches the server's metadata region | ||
| // If no topology is specified, the source volume must be in the same region as the server's metadata region | ||
| if accessibilityRequirements == nil && volumeData.Region != cs.metadata.Region { | ||
| return nil, errRegionMismatch(volumeData.Region, cs.metadata.Region) | ||
| } | ||
|
|
||
| // If a topology is specified, the source volume must be in the same region as the specified topology | ||
| if accessibilityRequirements != nil { | ||
| if volumeData.Region != accessibilityRequirements.GetPreferred()[0].GetSegments()[VolumeTopologyRegion] { | ||
|
||
| return nil, errRegionMismatch(volumeData.Region, accessibilityRequirements.GetPreferred()[0].GetSegments()[VolumeTopologyRegion]) | ||
| } | ||
| } | ||
|
|
||
| log.V(4).Info("Content source volume", "volumeData", volumeData) | ||
| return volKey, nil | ||
| } | ||
|
|
||
| // attemptCreateLinodeVolume creates a Linode volume while ensuring idempotency. | ||
| // It checks for existing volumes with the same label and either returns the existing | ||
| // volume or creates a new one, optionally cloning from a source volume. | ||
| func (cs *ControllerServer) attemptCreateLinodeVolume(ctx context.Context, label string, sizeGB int, tags string, sourceVolume *linodevolumes.LinodeVolumeKey) (*linodego.Volume, error) { | ||
| func (cs *ControllerServer) attemptCreateLinodeVolume(ctx context.Context, label string, sizeGB int, tags string, sourceVolume *linodevolumes.LinodeVolumeKey, accessibilityRequirements *csi.TopologyRequirement) (*linodego.Volume, error) { | ||
| log := logger.GetLogger(ctx) | ||
| log.V(4).Info("Attempting to create Linode volume", "label", label, "sizeGB", sizeGB, "tags", tags) | ||
|
|
||
|
|
@@ -209,18 +217,43 @@ func (cs *ControllerServer) attemptCreateLinodeVolume(ctx context.Context, label | |
| return cs.cloneLinodeVolume(ctx, label, sourceVolume.VolumeID) | ||
| } | ||
|
|
||
| return cs.createLinodeVolume(ctx, label, sizeGB, tags) | ||
| return cs.createLinodeVolume(ctx, label, sizeGB, tags, accessibilityRequirements) | ||
| } | ||
|
|
||
| // Helper function to extract region from topology | ||
| func getRegionFromTopology(requirements *csi.TopologyRequirement) string { | ||
| topologies := requirements.GetPreferred() | ||
| if len(topologies) == 0 { | ||
| topologies = requirements.GetRequisite() | ||
| } | ||
|
|
||
| if len(topologies) > 0 { | ||
| if value, ok := topologies[0].GetSegments()[VolumeTopologyRegion]; ok { | ||
| return value | ||
| } | ||
| } | ||
|
|
||
| return "" | ||
| } | ||
|
|
||
| // createLinodeVolume creates a new Linode volume with the specified label, size, and tags. | ||
| // It returns the created volume or an error if the creation fails. | ||
| func (cs *ControllerServer) createLinodeVolume(ctx context.Context, label string, sizeGB int, tags string) (*linodego.Volume, error) { | ||
| func (cs *ControllerServer) createLinodeVolume(ctx context.Context, label string, sizeGB int, tags string, accessibilityRequirements *csi.TopologyRequirement) (*linodego.Volume, error) { | ||
| log := logger.GetLogger(ctx) | ||
| log.V(4).Info("Creating Linode volume", "label", label, "sizeGB", sizeGB, "tags", tags) | ||
|
|
||
| // Get the region from req.AccessibilityRequirements if it exists. Fall back to the controller's metadata region if not specified. | ||
| region := cs.metadata.Region | ||
| if accessibilityRequirements != nil { | ||
| if topologyRegion := getRegionFromTopology(accessibilityRequirements); topologyRegion != "" { | ||
| log.V(4).Info("Using region from topology", "region", topologyRegion) | ||
| region = topologyRegion | ||
| } | ||
| } | ||
|
|
||
| // Prepare the volume creation request with region, label, and size. | ||
| volumeReq := linodego.VolumeCreateOptions{ | ||
| Region: cs.metadata.Region, | ||
| Region: region, | ||
| Label: label, | ||
| Size: sizeGB, | ||
| } | ||
|
|
@@ -394,7 +427,7 @@ func (cs *ControllerServer) prepareVolumeParams(ctx context.Context, req *csi.Cr | |
|
|
||
| // createVolumeContext creates a context map for the volume based on the request parameters. | ||
| // If the volume is encrypted, it adds relevant encryption attributes to the context. | ||
| func (cs *ControllerServer) createVolumeContext(ctx context.Context, req *csi.CreateVolumeRequest) map[string]string { | ||
| func (cs *ControllerServer) createVolumeContext(ctx context.Context, req *csi.CreateVolumeRequest, vol *linodego.Volume) map[string]string { | ||
| log := logger.GetLogger(ctx) | ||
| log.V(4).Info("Entering createVolumeContext()", "req", req) | ||
| defer log.V(4).Info("Exiting createVolumeContext()") | ||
|
|
@@ -408,18 +441,20 @@ func (cs *ControllerServer) createVolumeContext(ctx context.Context, req *csi.Cr | |
| volumeContext[LuksKeySizeAttribute] = req.GetParameters()[LuksKeySizeAttribute] | ||
| } | ||
|
|
||
| volumeContext[VolumeTopologyRegion] = vol.Region | ||
|
|
||
| log.V(4).Info("Volume context created", "volumeContext", volumeContext) | ||
| return volumeContext | ||
| } | ||
|
|
||
| // createAndWaitForVolume attempts to create a new volume and waits for it to become active. | ||
| // It logs the process and handles any errors that occur during creation or waiting. | ||
| func (cs *ControllerServer) createAndWaitForVolume(ctx context.Context, name string, sizeGB int, tags string, sourceInfo *linodevolumes.LinodeVolumeKey) (*linodego.Volume, error) { | ||
| func (cs *ControllerServer) createAndWaitForVolume(ctx context.Context, name string, sizeGB int, tags string, sourceInfo *linodevolumes.LinodeVolumeKey, accessibilityRequirements *csi.TopologyRequirement) (*linodego.Volume, error) { | ||
| log := logger.GetLogger(ctx) | ||
| log.V(4).Info("Entering createAndWaitForVolume()", "name", name, "sizeGB", sizeGB, "tags", tags) | ||
| defer log.V(4).Info("Exiting createAndWaitForVolume()") | ||
|
|
||
| vol, err := cs.attemptCreateLinodeVolume(ctx, name, sizeGB, tags, sourceInfo) | ||
| vol, err := cs.attemptCreateLinodeVolume(ctx, name, sizeGB, tags, sourceInfo, accessibilityRequirements) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
|
|
@@ -518,14 +553,20 @@ func (cs *ControllerServer) validateControllerPublishVolumeRequest(ctx context.C | |
| return linodeID, volumeID, nil | ||
| } | ||
|
|
||
| // getAndValidateVolume retrieves the volume by its ID and checks if it is | ||
| // attached to the specified Linode instance. If the volume is found and | ||
| // already attached to the instance, it returns its device path. | ||
| // If the volume is not found or attached to a different instance, it | ||
| // returns an appropriate error. | ||
| func (cs *ControllerServer) getAndValidateVolume(ctx context.Context, volumeID, linodeID int) (string, error) { | ||
| // getAndValidateVolume retrieves the volume by its ID and run checks. | ||
| // | ||
| // It performs the following checks: | ||
| // 1. If the volume is found and already attached to the specified Linode instance, | ||
| // it returns the device path of the volume. | ||
| // 2. If the volume is not found, it returns an error indicating that the volume does not exist. | ||
| // 3. If the volume is attached to a different instance, it returns an error indicating | ||
| // that the volume is already attached elsewhere. | ||
| // | ||
| // Additionally, it checks if the volume and instance are in the same region based on | ||
| // the provided volume context. If they are not in the same region, it returns an internal error. | ||
| func (cs *ControllerServer) getAndValidateVolume(ctx context.Context, volumeID int, instance *linodego.Instance, volContext map[string]string) (string, error) { | ||
| log := logger.GetLogger(ctx) | ||
| log.V(4).Info("Entering getAndValidateVolume()", "volumeID", volumeID, "linodeID", linodeID) | ||
| log.V(4).Info("Entering getAndValidateVolume()", "volumeID", volumeID, "linodeID", instance.ID) | ||
| defer log.V(4).Info("Exiting getAndValidateVolume()") | ||
|
|
||
| volume, err := cs.client.GetVolume(ctx, volumeID) | ||
|
|
@@ -536,14 +577,19 @@ func (cs *ControllerServer) getAndValidateVolume(ctx context.Context, volumeID, | |
| } | ||
|
|
||
| if volume.LinodeID != nil { | ||
| if *volume.LinodeID == linodeID { | ||
| if *volume.LinodeID == instance.ID { | ||
| log.V(4).Info("Volume already attached to instance", "volume_id", volume.ID, "node_id", *volume.LinodeID, "device_path", volume.FilesystemPath) | ||
| return volume.FilesystemPath, nil | ||
| } | ||
| return "", errVolumeAttached(volumeID, linodeID) | ||
| return "", errVolumeAttached(volumeID, instance.ID) | ||
| } | ||
|
|
||
| // check if the volume and instance are in the same region | ||
| if instance.Region != volContext[VolumeTopologyRegion] { | ||
| return "", errRegionMismatch(volContext[VolumeTopologyRegion], instance.Region) | ||
| } | ||
|
|
||
| log.V(4).Info("Volume validated and is not attached to instance", "volume_id", volume.ID, "node_id", linodeID) | ||
| log.V(4).Info("Volume validated and is not attached to instance", "volume_id", volume.ID, "node_id", instance.ID) | ||
| return "", nil | ||
| } | ||
|
|
||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.