Skip to content

Upgrade CAPO version to v0.12.2 #152

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 10 commits into from

Conversation

noonedeadpunk
Copy link
Contributor

@noonedeadpunk noonedeadpunk commented Apr 7, 2025

In CAPO version v0.11.2 there is a severe bug allowing to accomplish
Denial of Service by any tenant.

Manual removal of VM by tenant which is managed by CAPO results
in a pod crash in a loop. This has been fixed with [1] and is part
of the 0.12.2 release.

[1] kubernetes-sigs/cluster-api-provider-openstack#2477

@noonedeadpunk
Copy link
Contributor Author

recheck

@noonedeadpunk
Copy link
Contributor Author

OSError: [Errno 24] Too many open files - for linters seems unrelated

@noonedeadpunk noonedeadpunk marked this pull request as draft April 7, 2025 16:55
@noonedeadpunk
Copy link
Contributor Author

So, in this CAPO version kind: Image is gone. So I'd guess that it also needs more modern CAPI or smth....

@noonedeadpunk
Copy link
Contributor Author

Yeah, ok, it's not capi version, but missing ORC which was split into separate project. doh.

@mnaser
Copy link
Member

mnaser commented Apr 7, 2025

@noonedeadpunk could we get away with bumping to latest 0.11.x which might have the fix?

@noonedeadpunk
Copy link
Contributor Author

@mnaser this is the first I checked and unfortunately it's not there as of today. Probably could attempt backporting to 0.11, but I kinda not confident in stable policy in there :(

@mnaser
Copy link
Member

mnaser commented Apr 7, 2025

Ah, the team is pretty flexible at backporting things especially if it's a crash. One moment.

@noonedeadpunk
Copy link
Contributor Author

Oops, just realized I never added a fix, here it is: kubernetes-sigs/cluster-api-provider-openstack#2477

I'm also looking at what it would take to install ORC, as I'd guess sooner or later this needs to be done anyway.

@mnaser
Copy link
Member

mnaser commented Apr 7, 2025

I pushed kubernetes-sigs/cluster-api-provider-openstack#2507

I'll ping folks for a review and hopefully we can get that landed, would still need a release :(

@mnaser
Copy link
Member

mnaser commented Apr 7, 2025

I'm also looking at what it would take to install ORC, as I'd guess sooner or later this needs to be done anyway.

I think the best way to go about this is to go over the install instructions on a normal Kind cluster and then see how to "replicate" this into the playbook.

@noonedeadpunk
Copy link
Contributor Author

fwiw, regarding rocky failures in molecule here: we've spotted same failures caused by apparmor blocking PAM inside of the docker with EL, when host is running Ubuntu 24.04. And become/or SSH.

With SSH workaround was to comment out UsePAM, but for become - we just dropped become from the role....

@noonedeadpunk
Copy link
Contributor Author

Ok, so I was able to spawn a healthy cluster with this PR in:

~# openstack coe cluster show 1458e73e-2440-4aff-a57e-37d7acb46c2f -c created_at -c status -c health_status -c labels -c coe_version -c labels_added -c health_status_reason
+----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Field                | Value                                                                                                                                                                                                             |
+----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| status               | CREATE_COMPLETE                                                                                                                                                                                                   |
| health_status        | HEALTHY                                                                                                                                                                                                           |
| created_at           | 2025-04-07T19:46:37+00:00                                                                                                                                                                                         |
| coe_version          | v1.31.1                                                                                                                                                                                                           |
| labels               | {'cloud_provider_enabled': 'True', 'kube_tag': 'v1.31.1', 'calico_tag': 'v3.29.0', 'octavia_provider': 'amphorav2', 'octavia_lb_algorithm': 'SOURCE_IP_PORT', 'availability_zone': 'az1', 'auto_scaling_enabled': |
|                      | 'False', 'auto_healing_enabled': 'False', 'master_lb_floating_ip_enabled': 'True', 'kube_dashboard_enabled': 'True', 'ingress_controller': 'octavia'}                                                             |
| labels_added         | {'availability_zone': 'az1', 'auto_scaling_enabled': 'False', 'auto_healing_enabled': 'False', 'master_lb_floating_ip_enabled': 'True', 'kube_dashboard_enabled': 'True', 'ingress_controller': 'octavia'}        |
| health_status_reason | {'kube-pldql-default-worker-t4hhs-24zr6-59h7k.Ready': 'True', 'kube-pldql-default-worker-t4hhs-24zr6-7wqc2.Ready': 'True', 'kube-pldql-gx976-jgs24.Ready': 'True'}                                                |
+----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Though it adds new required variable: cluster_api_openstack_controller_version: 2.0.3

@mnaser With that I was wondering - how CHANGELOG.md is managed? Manually or automated from some fragments?

@noonedeadpunk
Copy link
Contributor Author

The upgrade job seems validly broken :(

https://zuul.atmosphere.vexxhost.dev/build/5d0e70976ea648d9ab3dd9d548995378

@noonedeadpunk
Copy link
Contributor Author

Regarding ansible-test - gitlab does install requirements for /opt/hostedtoolcache/Python/3.10.16/x64 but then ansible-test units tries to execute through /usr/bin/python3.12

@noonedeadpunk noonedeadpunk marked this pull request as ready for review April 14, 2025 09:51
@noonedeadpunk noonedeadpunk requested a review from mnaser April 16, 2025 15:58
@yaguangtang
Copy link
Member

@noonedeadpunk I have fixed the CI issue

@noonedeadpunk
Copy link
Contributor Author

Would be really nice to get some reviews/progress on this one...

@noonedeadpunk
Copy link
Contributor Author

Any updates?

noonedeadpunk and others added 4 commits July 10, 2025 12:02
In CAPO version v0.11.2 there is a severe bug allowing to accomplish
Denial of Service by any tenant.

Manual removal of VM by tenant which is managed by CAPO results
in a pod crash in a loop. This has been fixed with [1] and is part
of the  0.12.2 release.

Signed-off-by: Dmitriy Rabotyagov <noonedeadpunk@gmail.com>
More modern CAPO also requires corresponding CAPI , otherwise
VM creation fails with:
`no matches for kind \"Image\" in version \"openstack.k-orc.cloud/v1alpha1\`

Signed-off-by: Dmitriy Rabotyagov <noonedeadpunk@gmail.com>
CAPO 0.12.0 has removed ORC [1] and now it needs to be installed
additionally.

[1] https://github.com/kubernetes-sigs/cluster-api-provider-openstack/releases/tag/v0.12.0

Signed-off-by: Dmitriy Rabotyagov <noonedeadpunk@gmail.com>
Signed-off-by: Dmitriy Rabotyagov <noonedeadpunk@gmail.com>
mnaser and others added 5 commits July 10, 2025 12:02
Signed-off-by: Dmitriy Rabotyagov <noonedeadpunk@gmail.com>
Signed-off-by: Dmitriy Rabotyagov <noonedeadpunk@gmail.com>
Signed-off-by: Dmitriy Rabotyagov <noonedeadpunk@gmail.com>
* feat: allow set capo instance creation timeoput

Signed-off-by: Tadas Sutkaitis <tadasas@gmail.com>

* fix: license and rename variable

Signed-off-by: Tadas Sutkaitis <tadasas@gmail.com>

* fix: patch using native kubernetes module

Signed-off-by: Tadas Sutkaitis <tadas.sutkaitis@vexxhost.com>

---------

Signed-off-by: Tadas Sutkaitis <tadasas@gmail.com>
Signed-off-by: Tadas Sutkaitis <tadas.sutkaitis@vexxhost.com>
Signed-off-by: Dmitriy Rabotyagov <noonedeadpunk@gmail.com>
Signed-off-by: Dong Ma <dong.ma@vexxhost.com>
Signed-off-by: Dmitriy Rabotyagov <noonedeadpunk@gmail.com>
@noonedeadpunk
Copy link
Contributor Author

omfg... Adding DCO seemed to pull in quite some unrelated things with rebase... I have no idea how to resolve that in github tbh at this point...

@noonedeadpunk
Copy link
Contributor Author

recheck - Error: etcdserver: request timed out

@noonedeadpunk
Copy link
Contributor Author

recheck

@noonedeadpunk
Copy link
Contributor Author

noonedeadpunk commented Jul 10, 2025

In favor of #165 due to DCO mess-up

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants