Skip to content

Commit 84661b5

Browse files
authored
Add support for Talos OS (#87)
* Add Talos OS support `/etc` on Talos OS is read-only, which means that new PVs will fail to create. This change makes the hard-coded `/etc/lvm` configurable via the `--hostwritepath` flag. NOTE that this also changes the current `/run/lock/lvm` to `/etc/lvm/lock`. This is a requirement for metal-stack/helm-charts#64 Signed-off-by: Gerhard Lazu <gerhard@lazu.ch> * Create loop devices part of `make test` After this change, people which expect integration tests to be self-contained, will not be disappointed. It took me a while to figure out why **some** integration tests were failing locally. I eventually found out about this requirement in this doc page: https://docs.metal-stack.io/stable/external/csi-driver-lvm/README/. The GitHub Actions workflow also helped. Even then, the mknod command was not mentioned anywhere. My NixOS host did not have these special files /dev/loop100 & /dev/loop101 created. With this change, `make test` is self-contained & it should work the same on all Linux hosts, whether it's a local development workstation or running in GitHub Actions. Speaking of GitHub Actions, we do not want to run the build-platforms job if the DOCKER_REGISTRY_TOKEN secret is not set. If we don't check for this, the job will fail in repo forks, where these secrets will not be available by default. FWIW, `${{ secrets. }}` is not available in `if` conditions. The secret value needs to be exposed as an env for the `if` condition to work correctly. FTR: https://github.com/orgs/community/discussions/26726 I also remembered to remove the loop devices part of `make test-cleanup` & double-check that the loop device has been actually removed. I have hit a situation where the file was deleted, but /dev/loop100 was still left dangling. Had to `sudo dmsetup remove` it. Lastly, Docker CLI is configured to ignore the *.img files. These are created in the same directory and should not be sent to Docker when running `docker build`. Signed-off-by: Gerhard Lazu <gerhard@lazu.ch> * Refactor tests Remove all hard-coded sleeps **except** the last one, when we delete the csi-lvm-controller, otherwise PVCs may not get deleted before the controller is deleted. When this happens, the loop devices will not be cleared correctly when running `make test-cleanup`. We also want to test one thing per test, otherwise we may not know why a test failed. We leverage `kubectl wait --for=jsonpath=` as much as possible. This way the tests do not need to check for specific strings, we let `--for=jsonpath=` do that. The best part with this approach is that we can use the `--timeout` flag. This brings the **entire** integration test suite duration to 70 seconds. Before this change, the sleeps alone (170s) would take longer than that. To double-check for race conditions or flaky tests, I ran all tests locally 100 times with `RERUN=100 make test`. All 100 runs passed. This looks good to me! Separately, I have also tested this in Talos v1.4.0 running K8s 1.26.4. Everything works as expected now. See this PR comment for more details: #87 (comment) Signed-off-by: Gerhard Lazu <gerhard@lazu.ch> --------- Signed-off-by: Gerhard Lazu <gerhard@lazu.ch>
1 parent 999b3ee commit 84661b5

15 files changed

+131
-100
lines changed

.dockerignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
*.img

.github/workflows/docker.yaml

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -55,8 +55,6 @@ jobs:
5555

5656
- name: Test
5757
run: |
58-
for i in 100 101; do fallocate -l 1G loop${i}.img ; sudo losetup /dev/loop${i} loop${i}.img; done
59-
sudo losetup -a
6058
make test
6159
6260
build-platforms:
@@ -65,33 +63,41 @@ jobs:
6563
needs:
6664
- lint
6765
- test
66+
env:
67+
DOCKER_REGISTRY_TOKEN: ${{ secrets.DOCKER_REGISTRY_TOKEN }}
6868

6969
steps:
7070
- name: Log in to the container registry
71+
if: ${{ env.DOCKER_REGISTRY_TOKEN != '' }}
7172
uses: docker/login-action@v2
7273
with:
7374
registry: ${{ env.REGISTRY }}
7475
username: ${{ secrets.DOCKER_REGISTRY_USER }}
7576
password: ${{ secrets.DOCKER_REGISTRY_TOKEN }}
7677

7778
- name: Checkout
79+
if: ${{ env.DOCKER_REGISTRY_TOKEN != '' }}
7880
uses: actions/checkout@v3
7981

8082
- name: Set up Go 1.19
83+
if: ${{ env.DOCKER_REGISTRY_TOKEN != '' }}
8184
uses: actions/setup-go@v3
8285
with:
8386
go-version: 1.19
8487

8588
- name: Set up Docker Buildx
89+
if: ${{ env.DOCKER_REGISTRY_TOKEN != '' }}
8690
uses: docker/setup-buildx-action@v2
8791

8892
- name: Make tag
93+
if: ${{ env.DOCKER_REGISTRY_TOKEN != '' }}
8994
run: |
9095
[ "${GITHUB_EVENT_NAME}" == 'pull_request' ] && echo "tag=${GITHUB_HEAD_REF##*/}" >> $GITHUB_ENV || true
9196
[ "${GITHUB_EVENT_NAME}" == 'release' ] && echo "tag=${GITHUB_REF##*/}" >> $GITHUB_ENV || true
9297
[ "${GITHUB_EVENT_NAME}" == 'push' ] && echo "tag=latest" >> $GITHUB_ENV || true
9398
9499
- name: Build and push image
100+
if: ${{ env.DOCKER_REGISTRY_TOKEN != '' }}
95101
uses: docker/build-push-action@v3
96102
with:
97103
context: .
@@ -100,6 +106,7 @@ jobs:
100106
platforms: linux/amd64,linux/arm64,linux/arm/v7
101107

102108
- name: Build and push provisioner image
109+
if: ${{ env.DOCKER_REGISTRY_TOKEN != '' }}
103110
uses: docker/build-push-action@v3
104111
with:
105112
context: .

Makefile

Lines changed: 31 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -30,8 +30,25 @@ build-plugin:
3030
build-provisioner:
3131
docker build -t csi-driver-lvm-provisioner . -f cmd/provisioner/Dockerfile
3232

33-
.PHONY: test
34-
test: build-plugin build-provisioner
33+
/dev/loop%:
34+
@fallocate --length 1G loop$*.img
35+
ifndef GITHUB_ACTIONS
36+
@sudo mknod $@ b 7 $*
37+
endif
38+
@sudo losetup $@ loop$*.img
39+
@sudo losetup $@
40+
41+
rm-loop%:
42+
@sudo losetup -d /dev/loop$* || true
43+
@! losetup /dev/loop$*
44+
@sudo rm -f /dev/loop$*
45+
@rm loop$*.img
46+
# If removing this loop device fails, you may need to:
47+
# sudo dmsetup info
48+
# sudo dmsetup remove <DEVICE_NAME>
49+
50+
.PHONY: kind
51+
kind:
3552
@if ! which kind > /dev/null; then echo "kind needs to be installed"; exit 1; fi
3653
@if ! kind get clusters | grep csi-driver-lvm > /dev/null; then \
3754
kind create cluster \
@@ -40,15 +57,24 @@ test: build-plugin build-provisioner
4057
--kubeconfig $(KUBECONFIG); fi
4158
@kind --name csi-driver-lvm load docker-image csi-driver-lvm
4259
@kind --name csi-driver-lvm load docker-image csi-driver-lvm-provisioner
60+
61+
.PHONY: rm-kind
62+
rm-kind:
63+
@kind delete cluster --name csi-driver-lvm
64+
65+
RERUN ?= 1
66+
.PHONY: test
67+
test: build-plugin build-provisioner /dev/loop100 /dev/loop101 kind
4368
@cd tests && docker build -t csi-bats . && cd -
69+
@for i in {1..$(RERUN)}; do \
4470
docker run -i$(DOCKER_TTY_ARG) \
4571
-e HELM_REPO=$(HELM_REPO) \
4672
-v "$(KUBECONFIG):/root/.kube/config" \
4773
-v "$(PWD)/tests:/code" \
4874
--network host \
4975
csi-bats \
50-
--verbose-run --trace --timing bats/test.bats
76+
--verbose-run --trace --timing bats/test.bats ; \
77+
done
5178

5279
.PHONY: test-cleanup
53-
test-cleanup:
54-
@kind delete cluster --name csi-driver-lvm
80+
test-cleanup: rm-loop100 rm-loop101 rm-kind

cmd/lvmplugin/main.go

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ func init() {
3535

3636
var (
3737
endpoint = flag.String("endpoint", "unix://tmp/csi.sock", "CSI endpoint")
38+
hostWritePath = flag.String("hostwritepath", "/etc/lvm", "host path where config, cache & backups will be written to")
3839
driverName = flag.String("drivername", "lvm.csi.metal-stack.io", "name of the driver")
3940
nodeID = flag.String("nodeid", "", "node id")
4041
ephemeral = flag.Bool("ephemeral", false, "publish volumes in ephemeral mode even if kubelet did not ask for it (only needed for Kubernetes 1.15)")
@@ -68,7 +69,7 @@ func main() {
6869
}
6970

7071
func handle() {
71-
driver, err := lvm.NewLvmDriver(*driverName, *nodeID, *endpoint, *ephemeral, *maxVolumesPerNode, version, *devicesPattern, *vgName, *namespace, *provisionerImage, *pullPolicy)
72+
driver, err := lvm.NewLvmDriver(*driverName, *nodeID, *endpoint, *hostWritePath, *ephemeral, *maxVolumesPerNode, version, *devicesPattern, *vgName, *namespace, *provisionerImage, *pullPolicy)
7273
if err != nil {
7374
fmt.Printf("Failed to initialize driver: %s\n", err.Error())
7475
os.Exit(1)

pkg/lvm/controllerserver.go

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,14 +36,15 @@ type controllerServer struct {
3636
nodeID string
3737
devicesPattern string
3838
vgName string
39+
hostWritePath string
3940
kubeClient kubernetes.Clientset
4041
provisionerImage string
4142
pullPolicy v1.PullPolicy
4243
namespace string
4344
}
4445

4546
// NewControllerServer
46-
func newControllerServer(ephemeral bool, nodeID string, devicesPattern string, vgName string, namespace string, provisionerImage string, pullPolicy v1.PullPolicy) (*controllerServer, error) {
47+
func newControllerServer(ephemeral bool, nodeID string, devicesPattern string, vgName string, hostWritePath string, namespace string, provisionerImage string, pullPolicy v1.PullPolicy) (*controllerServer, error) {
4748
if ephemeral {
4849
return &controllerServer{caps: getControllerServiceCapabilities(nil), nodeID: nodeID}, nil
4950
}
@@ -69,6 +70,7 @@ func newControllerServer(ephemeral bool, nodeID string, devicesPattern string, v
6970
}),
7071
nodeID: nodeID,
7172
devicesPattern: devicesPattern,
73+
hostWritePath: hostWritePath,
7274
vgName: vgName,
7375
kubeClient: *kubeClient,
7476
namespace: namespace,
@@ -137,6 +139,7 @@ func (cs *controllerServer) CreateVolume(ctx context.Context, req *csi.CreateVol
137139
kubeClient: cs.kubeClient,
138140
namespace: cs.namespace,
139141
vgName: cs.vgName,
142+
hostWritePath: cs.hostWritePath,
140143
}
141144
if err := createProvisionerPod(ctx, va); err != nil {
142145
klog.Errorf("error creating provisioner pod :%v", err)
@@ -197,6 +200,7 @@ func (cs *controllerServer) DeleteVolume(ctx context.Context, req *csi.DeleteVol
197200
kubeClient: cs.kubeClient,
198201
namespace: cs.namespace,
199202
vgName: cs.vgName,
203+
hostWritePath: cs.hostWritePath,
200204
}
201205
if err := createProvisionerPod(ctx, va); err != nil {
202206
klog.Errorf("error creating provisioner pod :%v", err)

pkg/lvm/lvm.go

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,7 @@ type Lvm struct {
4545
nodeID string
4646
version string
4747
endpoint string
48+
hostWritePath string
4849
ephemeral bool
4950
maxVolumesPerNode int64
5051
devicesPattern string
@@ -76,6 +77,7 @@ type volumeAction struct {
7677
kubeClient kubernetes.Clientset
7778
namespace string
7879
vgName string
80+
hostWritePath string
7981
}
8082

8183
const (
@@ -93,7 +95,7 @@ var (
9395
)
9496

9597
// NewLvmDriver creates the driver
96-
func NewLvmDriver(driverName, nodeID, endpoint string, ephemeral bool, maxVolumesPerNode int64, version string, devicesPattern string, vgName string, namespace string, provisionerImage string, pullPolicy string) (*Lvm, error) {
98+
func NewLvmDriver(driverName, nodeID, endpoint string, hostWritePath string, ephemeral bool, maxVolumesPerNode int64, version string, devicesPattern string, vgName string, namespace string, provisionerImage string, pullPolicy string) (*Lvm, error) {
9799
if driverName == "" {
98100
return nil, fmt.Errorf("no driver name provided")
99101
}
@@ -123,6 +125,7 @@ func NewLvmDriver(driverName, nodeID, endpoint string, ephemeral bool, maxVolume
123125
version: vendorVersion,
124126
nodeID: nodeID,
125127
endpoint: endpoint,
128+
hostWritePath: hostWritePath,
126129
ephemeral: ephemeral,
127130
maxVolumesPerNode: maxVolumesPerNode,
128131
devicesPattern: devicesPattern,
@@ -139,7 +142,7 @@ func (lvm *Lvm) Run() error {
139142
// Create GRPC servers
140143
lvm.ids = newIdentityServer(lvm.name, lvm.version)
141144
lvm.ns = newNodeServer(lvm.nodeID, lvm.ephemeral, lvm.maxVolumesPerNode, lvm.devicesPattern, lvm.vgName)
142-
lvm.cs, err = newControllerServer(lvm.ephemeral, lvm.nodeID, lvm.devicesPattern, lvm.vgName, lvm.namespace, lvm.provisionerImage, lvm.pullPolicy)
145+
lvm.cs, err = newControllerServer(lvm.ephemeral, lvm.nodeID, lvm.devicesPattern, lvm.vgName, lvm.hostWritePath, lvm.namespace, lvm.provisionerImage, lvm.pullPolicy)
143146
if err != nil {
144147
return err
145148
}
@@ -360,7 +363,7 @@ func createProvisionerPod(ctx context.Context, va volumeAction) (err error) {
360363
Name: "lvmbackup",
361364
VolumeSource: v1.VolumeSource{
362365
HostPath: &v1.HostPathVolumeSource{
363-
Path: "/etc/lvm/backup",
366+
Path: filepath.Join(va.hostWritePath, "backup"),
364367
Type: &hostPathType,
365368
},
366369
},
@@ -369,7 +372,7 @@ func createProvisionerPod(ctx context.Context, va volumeAction) (err error) {
369372
Name: "lvmcache",
370373
VolumeSource: v1.VolumeSource{
371374
HostPath: &v1.HostPathVolumeSource{
372-
Path: "/etc/lvm/cache",
375+
Path: filepath.Join(va.hostWritePath, "cache"),
373376
Type: &hostPathType,
374377
},
375378
},
@@ -378,7 +381,7 @@ func createProvisionerPod(ctx context.Context, va volumeAction) (err error) {
378381
Name: "lvmlock",
379382
VolumeSource: v1.VolumeSource{
380383
HostPath: &v1.HostPathVolumeSource{
381-
Path: "/run/lock/lvm",
384+
Path: filepath.Join(va.hostWritePath, "lock"),
382385
Type: &hostPathType,
383386
},
384387
},

0 commit comments

Comments
 (0)