Skip to content
Merged
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
a88cfa7
feat: Enhance volume creation with accessibility requirements
komer3 Oct 4, 2024
eafca43
feat: Update ControllerPublishVolume() to use the region, provided in…
komer3 Oct 4, 2024
eff8321
test: update the unit test cases based on the previous changes
komer3 Oct 4, 2024
2c4ad48
refactor: reducing the nested if complexity for golangci :)
komer3 Oct 4, 2024
e30ba15
fix: update test cases that were failing and minor refactors
komer3 Oct 4, 2024
92e0a49
fix: linting
komer3 Oct 4, 2024
6f7f536
Merge branch 'main' into multi-region-support
komer3 Oct 4, 2024
95b617e
fix: fixing logic that causing the controller server sanity tests to …
komer3 Oct 4, 2024
0fb44bc
Merge branch 'main' into multi-region-support
komer3 Oct 4, 2024
576771c
Merge branch 'main' into multi-region-support
komer3 Oct 7, 2024
4ffbc3f
feat: updating the logic to handle volume cloning within remote region
komer3 Oct 8, 2024
a98c894
feat: update the provisioner args to enable topology based provisioning
komer3 Oct 9, 2024
d5d4237
Testing to see if it all works with out fallback to metadata region w…
komer3 Oct 9, 2024
14752ae
test: run upstream e2e to make sure changes with topology based provi…
komer3 Oct 9, 2024
ac526d0
revert to having a region fallback to where controller server is depl…
komer3 Oct 9, 2024
328086d
no need to double check volume context since we guarantee to pass it …
komer3 Oct 9, 2024
2b7d27a
feat: update docs and manifest
komer3 Oct 9, 2024
200e115
Merge branch 'main' into multi-region-support
komer3 Oct 9, 2024
30fd9cb
updates to doc based on comments
komer3 Oct 9, 2024
e24ea9d
adding info about sidecare feature flag
komer3 Oct 9, 2024
0579192
refactor: simplify the region checking logic by reusing code/func
komer3 Oct 10, 2024
1c31cfd
test: Adding test cases from new getRegionFromTopology() func
komer3 Oct 10, 2024
47706e2
Switch storage class naming
komer3 Oct 11, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
- [Creating a PersistentVolumeClaim](docs/usage.md#creating-a-persistentvolumeclaim)
- [Encrypted Drives using LUKS](docs/encrypted-drives.md)
- [Adding Tags to Created Volumes](docs/volume-tags.md)
- [Topology-Aware Provisioning](docs/topology-aware-provisioning.md)
- [Development Setup](docs/development-setup.md)
- [Prerequisites](docs/development-setup.md#-prerequisites)
- [Setting Up the Local Development Environment](docs/development-setup.md#-setting-up-the-local-development-environment)
Expand Down
18 changes: 18 additions & 0 deletions deploy/kubernetes/base/csi-storageclass.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,21 @@ metadata:
provisioner: linodebs.csi.linode.com
reclaimPolicy: Retain
allowVolumeExpansion: true
---
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: linode-block-storage-topology-aware
namespace: kube-system
provisioner: linodebs.csi.linode.com
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
---
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: linode-block-storage-topology-aware-retain
namespace: kube-system
provisioner: linodebs.csi.linode.com
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer
1 change: 1 addition & 0 deletions deploy/kubernetes/base/ss-csi-linode-controller.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ spec:
- "--volume-name-prefix=pvc"
- "--volume-name-uuid-length=16"
- "--csi-address=$(ADDRESS)"
- "--feature-gates=Topology=true"
- "--v=2"
env:
- name: ADDRESS
Expand Down
80 changes: 80 additions & 0 deletions docs/topology-aware-provisioning.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
## 🌐 Topology-Aware Provisioning

This CSI driver supports topology-aware provisioning, optimizing volume placement based on the physical infrastructure layout.

**Notes:**

1. **Volume Cloning**: Cloning only works within the same region, not across regions.
2. **Volume Migration**: We can't move volumes across regions.
3. **Remote Provisioning**: Volume provisioning is supported in remote regions (nodes or clusters outside of the region where the controller server is deployed).

> [!IMPORTANT]
> Make sure you are using the latest release v0.8.6+ to utilize the remote provisioning feature.
#### 📝 Example StorageClass and PVC

```yaml
allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: linode-block-storage-topology-aware
provisioner: linodebs.csi.linode.com
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc-filesystem
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
storageClassName: linode-block-storage-topology-aware
```
> **Important**: The `volumeBindingMode: WaitForFirstConsumer` setting is crucial for topology-aware provisioning. It delays volume binding and creation until a pod using the PVC is created. This allows the system to consider the pod's scheduling requirements and node assignment when selecting the most appropriate storage location, ensuring optimal data locality and performance.

#### 🖥️ Example Pod

```yaml
apiVersion: v1
kind: Pod
metadata:
name: e2e-pod
spec:
nodeSelector:
topology.linode.com/region: us-ord
tolerations:
- key: "node-role.kubernetes.io/control-plane"
operator: "Exists"
effect: "NoSchedule"
containers:
- name: e2e-pod
image: ubuntu
command:
- sleep
- "1000000"
volumeMounts:
- mountPath: /data
name: csi-volume
volumes:
- name: csi-volume
persistentVolumeClaim:
claimName: pvc-filesystem
```

This example demonstrates how to set up topology-aware provisioning using the Linode Block Storage CSI Driver. The StorageClass defines the provisioner and reclaim policy, while the PersistentVolumeClaim requests storage from this class. The Pod specification shows how to use the PVC and includes a node selector for region-specific deployment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also mention that the cluster itself must be started with --feature-gates=CSINodeInfo=true ? https://kubernetes-csi.github.io/docs/topology.html?highlight=VOLUME_ACCESSIBILITY_CONSTRAINTS#kubernetes-cluster-setup

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good point. Let me add that as well.

Also, that feature flag is going to be turn on by default now with future csi releases

#### Provisioning Process

1. CO determines required topology based on application needs and cluster layout.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain what CO is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the doc. Hope its more clear now! Let me know if there is anything else that is unclear :)

2. CO includes `TopologyRequirement` in `CreateVolume` call.
3. CSI driver creates volume satisfying topology requirements.
4. Driver returns actual topology of created volume.
5. CO uses this information to schedule workloads on nodes with matching topology.

By leveraging topology-aware provisioning, CSI drivers ensure optimal volume placement within the infrastructure, improving performance, availability, and data locality.
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ spec:
- --volume-name-prefix=pvc
- --volume-name-uuid-length=16
- --csi-address=$(ADDRESS)
- --feature-gates=Topology=true
- --v=2
{{- if .Values.enable_metrics}}
- --metrics-address={{ .Values.csiProvisioner.metrics.address }}
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: linode-block-storage-topology-aware-retain
namespace: {{ required ".Values.namespace required" .Values.namespace }}
{{- if eq .Values.defaultStorageClass "linode-block-storage-topology-aware-retain" }}
annotations:
storageclass.kubernetes.io/is-default-class: "true"
{{- end }}
{{- if .Values.volumeTags }}
parameters:
linodebs.csi.linode.com/volumeTags: {{ join "," .Values.volumeTags }}
{{- end}}
allowVolumeExpansion: true
provisioner: linodebs.csi.linode.com
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: linode-block-storage-topology-aware
namespace: {{ required ".Values.namespace required" .Values.namespace }}
{{- if eq .Values.defaultStorageClass "linode-block-storage-topology-aware" }}
annotations:
storageclass.kubernetes.io/is-default-class: "true"
{{- end }}
{{- if .Values.volumeTags }}
parameters:
linodebs.csi.linode.com/volumeTags: {{ join "," .Values.volumeTags }}
{{- end}}
allowVolumeExpansion: true
provisioner: linodebs.csi.linode.com
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer
27 changes: 15 additions & 12 deletions internal/driver/controllerserver.go
Original file line number Diff line number Diff line change
Expand Up @@ -76,24 +76,27 @@ func (cs *ControllerServer) CreateVolume(ctx context.Context, req *csi.CreateVol
return &csi.CreateVolumeResponse{}, err
}

// Create volume context
volContext := cs.createVolumeContext(ctx, req)
contentSource := req.GetVolumeContentSource()
accessibilityRequirements := req.GetAccessibilityRequirements()

// Attempt to retrieve information about a source volume if the request includes a content source.
// This is important for scenarios where the volume is being cloned from an existing one.
sourceVolInfo, err := cs.getContentSourceVolume(ctx, req.GetVolumeContentSource())
sourceVolInfo, err := cs.getContentSourceVolume(ctx, contentSource, accessibilityRequirements)
if err != nil {
return &csi.CreateVolumeResponse{}, err
}

// Create the volume
vol, err := cs.createAndWaitForVolume(ctx, volName, sizeGB, req.GetParameters()[VolumeTags], sourceVolInfo)
vol, err := cs.createAndWaitForVolume(ctx, volName, sizeGB, req.GetParameters()[VolumeTags], sourceVolInfo, accessibilityRequirements)
if err != nil {
return &csi.CreateVolumeResponse{}, err
}

// Create volume context
volContext := cs.createVolumeContext(ctx, req, vol)

// Prepare and return response
resp := cs.prepareCreateVolumeResponse(ctx, vol, size, volContext, sourceVolInfo, req.GetVolumeContentSource())
resp := cs.prepareCreateVolumeResponse(ctx, vol, size, volContext, sourceVolInfo, contentSource)

log.V(2).Info("CreateVolume response", "response", resp)
return resp, nil
Expand Down Expand Up @@ -154,9 +157,15 @@ func (cs *ControllerServer) ControllerPublishVolume(ctx context.Context, req *cs
return resp, err
}

// Retrieve and validate the instance associated with the Linode ID
instance, err := cs.getInstance(ctx, linodeID)
if err != nil {
return resp, err
}

// Check if the volume exists and is valid.
// If the volume is already attached to the specified instance, it returns its device path.
devicePath, err := cs.getAndValidateVolume(ctx, volumeID, linodeID)
devicePath, err := cs.getAndValidateVolume(ctx, volumeID, instance, req.GetVolumeContext())
if err != nil {
return resp, err
}
Expand All @@ -169,12 +178,6 @@ func (cs *ControllerServer) ControllerPublishVolume(ctx context.Context, req *cs
}, nil
}

// Retrieve and validate the instance associated with the Linode ID
instance, err := cs.getInstance(ctx, linodeID)
if err != nil {
return resp, err
}

// Check if the instance can accommodate the volume attachment
if capErr := cs.checkAttachmentCapacity(ctx, instance); capErr != nil {
return resp, capErr
Expand Down
86 changes: 66 additions & 20 deletions internal/driver/controllerserver_helper.go
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,7 @@ func (cs *ControllerServer) maxAllowedVolumeAttachments(ctx context.Context, ins

// getContentSourceVolume retrieves information about the Linode volume to clone from.
// It returns a LinodeVolumeKey if a valid source volume is found, or an error if the source is invalid.
func (cs *ControllerServer) getContentSourceVolume(ctx context.Context, contentSource *csi.VolumeContentSource) (volKey *linodevolumes.LinodeVolumeKey, err error) {
func (cs *ControllerServer) getContentSourceVolume(ctx context.Context, contentSource *csi.VolumeContentSource, accessibilityRequirements *csi.TopologyRequirement) (volKey *linodevolumes.LinodeVolumeKey, err error) {
log := logger.GetLogger(ctx)
log.V(4).Info("Attempting to get content source volume")

Expand Down Expand Up @@ -167,19 +167,27 @@ func (cs *ControllerServer) getContentSourceVolume(ctx context.Context, contentS
return nil, errInternal("source volume *linodego.Volume is nil") // Throw an internal error if the processed linodego.Volume is nil
}

// Check if the volume's region matches the server's metadata region
if volumeData.Region != cs.metadata.Region {
// Check if the source volume's region matches the server's metadata region
// If no topology is specified, the source volume must be in the same region as the server's metadata region
if accessibilityRequirements == nil && volumeData.Region != cs.metadata.Region {
return nil, errRegionMismatch(volumeData.Region, cs.metadata.Region)
}

// If a topology is specified, the source volume must be in the same region as the specified topology
if accessibilityRequirements != nil {
if volumeData.Region != accessibilityRequirements.GetPreferred()[0].GetSegments()[VolumeTopologyRegion] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this guaranteed to be an array of len > 0 && is the key guaranteed to exist?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its pretty much guaranteed to exist and the first key is going to be the region we want to use.

for more info check this link out! https://kubernetes-csi.github.io/docs/topology.html?highlight=VOLUME_ACCESSIBILITY_CONSTRAINTS#implementing-topology-in-your-csi-driver

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was curious because getRegionFromTopology operates on the same accessibilityRequirements but includes all those checks, Thanks for the link - will check it out!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could also use that here - to validate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only time that is not passed (accessibilityRequirements) is when you have a broken multi region cluster (I had this issue while trying to make konnectivity work). In that senario, I'm falling back to using the region from the metadata (region where controller server is running)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Let me include those checks here as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated it!

return nil, errRegionMismatch(volumeData.Region, accessibilityRequirements.GetPreferred()[0].GetSegments()[VolumeTopologyRegion])
}
}

log.V(4).Info("Content source volume", "volumeData", volumeData)
return volKey, nil
}

// attemptCreateLinodeVolume creates a Linode volume while ensuring idempotency.
// It checks for existing volumes with the same label and either returns the existing
// volume or creates a new one, optionally cloning from a source volume.
func (cs *ControllerServer) attemptCreateLinodeVolume(ctx context.Context, label string, sizeGB int, tags string, sourceVolume *linodevolumes.LinodeVolumeKey) (*linodego.Volume, error) {
func (cs *ControllerServer) attemptCreateLinodeVolume(ctx context.Context, label string, sizeGB int, tags string, sourceVolume *linodevolumes.LinodeVolumeKey, accessibilityRequirements *csi.TopologyRequirement) (*linodego.Volume, error) {
log := logger.GetLogger(ctx)
log.V(4).Info("Attempting to create Linode volume", "label", label, "sizeGB", sizeGB, "tags", tags)

Expand Down Expand Up @@ -209,18 +217,43 @@ func (cs *ControllerServer) attemptCreateLinodeVolume(ctx context.Context, label
return cs.cloneLinodeVolume(ctx, label, sourceVolume.VolumeID)
}

return cs.createLinodeVolume(ctx, label, sizeGB, tags)
return cs.createLinodeVolume(ctx, label, sizeGB, tags, accessibilityRequirements)
}

// Helper function to extract region from topology
func getRegionFromTopology(requirements *csi.TopologyRequirement) string {
topologies := requirements.GetPreferred()
if len(topologies) == 0 {
topologies = requirements.GetRequisite()
}

if len(topologies) > 0 {
if value, ok := topologies[0].GetSegments()[VolumeTopologyRegion]; ok {
return value
}
}

return ""
}

// createLinodeVolume creates a new Linode volume with the specified label, size, and tags.
// It returns the created volume or an error if the creation fails.
func (cs *ControllerServer) createLinodeVolume(ctx context.Context, label string, sizeGB int, tags string) (*linodego.Volume, error) {
func (cs *ControllerServer) createLinodeVolume(ctx context.Context, label string, sizeGB int, tags string, accessibilityRequirements *csi.TopologyRequirement) (*linodego.Volume, error) {
log := logger.GetLogger(ctx)
log.V(4).Info("Creating Linode volume", "label", label, "sizeGB", sizeGB, "tags", tags)

// Get the region from req.AccessibilityRequirements if it exists. Fall back to the controller's metadata region if not specified.
region := cs.metadata.Region
if accessibilityRequirements != nil {
if topologyRegion := getRegionFromTopology(accessibilityRequirements); topologyRegion != "" {
log.V(4).Info("Using region from topology", "region", topologyRegion)
region = topologyRegion
}
}

// Prepare the volume creation request with region, label, and size.
volumeReq := linodego.VolumeCreateOptions{
Region: cs.metadata.Region,
Region: region,
Label: label,
Size: sizeGB,
}
Expand Down Expand Up @@ -394,7 +427,7 @@ func (cs *ControllerServer) prepareVolumeParams(ctx context.Context, req *csi.Cr

// createVolumeContext creates a context map for the volume based on the request parameters.
// If the volume is encrypted, it adds relevant encryption attributes to the context.
func (cs *ControllerServer) createVolumeContext(ctx context.Context, req *csi.CreateVolumeRequest) map[string]string {
func (cs *ControllerServer) createVolumeContext(ctx context.Context, req *csi.CreateVolumeRequest, vol *linodego.Volume) map[string]string {
log := logger.GetLogger(ctx)
log.V(4).Info("Entering createVolumeContext()", "req", req)
defer log.V(4).Info("Exiting createVolumeContext()")
Expand All @@ -408,18 +441,20 @@ func (cs *ControllerServer) createVolumeContext(ctx context.Context, req *csi.Cr
volumeContext[LuksKeySizeAttribute] = req.GetParameters()[LuksKeySizeAttribute]
}

volumeContext[VolumeTopologyRegion] = vol.Region

log.V(4).Info("Volume context created", "volumeContext", volumeContext)
return volumeContext
}

// createAndWaitForVolume attempts to create a new volume and waits for it to become active.
// It logs the process and handles any errors that occur during creation or waiting.
func (cs *ControllerServer) createAndWaitForVolume(ctx context.Context, name string, sizeGB int, tags string, sourceInfo *linodevolumes.LinodeVolumeKey) (*linodego.Volume, error) {
func (cs *ControllerServer) createAndWaitForVolume(ctx context.Context, name string, sizeGB int, tags string, sourceInfo *linodevolumes.LinodeVolumeKey, accessibilityRequirements *csi.TopologyRequirement) (*linodego.Volume, error) {
log := logger.GetLogger(ctx)
log.V(4).Info("Entering createAndWaitForVolume()", "name", name, "sizeGB", sizeGB, "tags", tags)
defer log.V(4).Info("Exiting createAndWaitForVolume()")

vol, err := cs.attemptCreateLinodeVolume(ctx, name, sizeGB, tags, sourceInfo)
vol, err := cs.attemptCreateLinodeVolume(ctx, name, sizeGB, tags, sourceInfo, accessibilityRequirements)
if err != nil {
return nil, err
}
Expand Down Expand Up @@ -518,14 +553,20 @@ func (cs *ControllerServer) validateControllerPublishVolumeRequest(ctx context.C
return linodeID, volumeID, nil
}

// getAndValidateVolume retrieves the volume by its ID and checks if it is
// attached to the specified Linode instance. If the volume is found and
// already attached to the instance, it returns its device path.
// If the volume is not found or attached to a different instance, it
// returns an appropriate error.
func (cs *ControllerServer) getAndValidateVolume(ctx context.Context, volumeID, linodeID int) (string, error) {
// getAndValidateVolume retrieves the volume by its ID and run checks.
//
// It performs the following checks:
// 1. If the volume is found and already attached to the specified Linode instance,
// it returns the device path of the volume.
// 2. If the volume is not found, it returns an error indicating that the volume does not exist.
// 3. If the volume is attached to a different instance, it returns an error indicating
// that the volume is already attached elsewhere.
//
// Additionally, it checks if the volume and instance are in the same region based on
// the provided volume context. If they are not in the same region, it returns an internal error.
func (cs *ControllerServer) getAndValidateVolume(ctx context.Context, volumeID int, instance *linodego.Instance, volContext map[string]string) (string, error) {
log := logger.GetLogger(ctx)
log.V(4).Info("Entering getAndValidateVolume()", "volumeID", volumeID, "linodeID", linodeID)
log.V(4).Info("Entering getAndValidateVolume()", "volumeID", volumeID, "linodeID", instance.ID)
defer log.V(4).Info("Exiting getAndValidateVolume()")

volume, err := cs.client.GetVolume(ctx, volumeID)
Expand All @@ -536,14 +577,19 @@ func (cs *ControllerServer) getAndValidateVolume(ctx context.Context, volumeID,
}

if volume.LinodeID != nil {
if *volume.LinodeID == linodeID {
if *volume.LinodeID == instance.ID {
log.V(4).Info("Volume already attached to instance", "volume_id", volume.ID, "node_id", *volume.LinodeID, "device_path", volume.FilesystemPath)
return volume.FilesystemPath, nil
}
return "", errVolumeAttached(volumeID, linodeID)
return "", errVolumeAttached(volumeID, instance.ID)
}

// check if the volume and instance are in the same region
if instance.Region != volContext[VolumeTopologyRegion] {
return "", errRegionMismatch(volContext[VolumeTopologyRegion], instance.Region)
}

log.V(4).Info("Volume validated and is not attached to instance", "volume_id", volume.ID, "node_id", linodeID)
log.V(4).Info("Volume validated and is not attached to instance", "volume_id", volume.ID, "node_id", instance.ID)
return "", nil
}

Expand Down
Loading
Loading