Skip to content

Add server metadata, soft-anti-affinity and local ssd flavors for gx-scs #742

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 26, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions doc/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,13 +67,16 @@ Parameters controlling the cluster creation:
| `` | `CONTROL_PLANE_ROOT_DISKSIZE` | SCS | `20` | *If* diskless flavors are used for control plane nodes, this is the allocated root volume disk size (in GB) |
| `` | `WORKER_ROOT_DISKSIZE` | SCS | `20` | *If* diskless flavors are used for worker nodes, this is the allocated root volume disk size (in GB) |
| `anti_affinity` | `OPENSTACK_ANTI_AFFINITY` | SCS | `true` | Use anti-affinity server groups to prevent k8s nodes on same host (soft for workers, hard for controllers) |
| `soft_anti_affinity_controller` | `OPENSTACK_SOFT_ANTI_AFFINITY_CONTROLLER` | SCS | `false` | Allow the use of soft-anti-affinity for the controllers (if `anti_affinity` is `true`) |
| `` | `OPENSTACK_SRVGRP_CONTROLLER` | SCS | `nonono` | Autogenerated if `anti_affinity` is `true`, eliminated otherwise |
| `` | `OPENSTACK_SRVGRP_WORKER` | SCS | `nonono` | Autogenerated if `anti_affinity` is `true`, eliminated otherwise |
| `deploy_occm` | `DEPLOY_OCCM` | SCS | `true` | Deploy the given version of OCCM into the cluster. `true` (default) chooses the latest version matching the k8s version. You can specify `master` to chose the upstream master branch. Don't disable this. |
| `deploy_cindercsi` | `DEPLOY_CINDERCSI` | SCS | `true` | Deploy the given (or latest matching for the default true value) of cinder CSI. |
| `etcd_unsafe_fs` | `ETCD_UNSAFE_FS` | SCS | `false` | Use `barrier=0` for filesystem on control nodes to avoid storage latency. Use for multi-controller clusters on slow/networked storage, otherwise not recommended. |
| `testcluster_name` | (cmd line) | SCS | `testcluster` | Allows setting the default cluster name, created at bootstrap (if `controller_count` is larger than 0) |
| `restrict_kubeapi` | `RESTRICT_KUBEAPI` | SCS | `[ ]` | Allows restricting access to kubernetes API by list of CIDRs. Empty list (default) means public, `[ "none" ]` means internal access only. |
| `controller_metadata` | `OPENSTACK_CONTROL_PLANE_MACHINE_METADATA` | SCS | `{ }` | Adds additional metadata for instances running the k8s management nodes |
| `worker_metadata` | `OPENSTACK_NODE_MACHINE_METADATA` | SCS | `{ }` | Adds additional metadata for instances running the k8s worker nodes |
| `` | `OPENSTACK_CLUSTER_GEN` | SCS | `geno01` | Generation counter for the OpenStackClusterTemplate resource. Increase, when changing restrict_kubeapi or other OC settings |
| `capo_instance_create_timeout` | `CLUSTER_API_OPENSTACK_INSTANCE_CREATE_TIMEOUT` | capo | `5` | Time to wait for an OpenStack machine to be created (in minutes) |
| `containerd_registry_files` | | SCS | `{"hosts":["./files/containerd/docker.io"], "certs":[]}` | Containerd registry hosts config files, see related [docs](./usage/containter-registry-configuration.md) for details. |
Expand Down
2 changes: 1 addition & 1 deletion playbooks/tasks/scs_compliance.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@
ansible.builtin.shell:
cmd:
". {{ python_venv_dir }}/bin/activate &&
python3 {{ check_dir }}/Tests/scs-compliance-check.py {{ check_dir }}/Tests/scs-compatible-kaas.yaml -v -s KaaS_V1 -a kubeconfig={{ kubeconfig_path }}"
python3 {{ check_dir }}/Tests/scs-compliance-check.py {{ check_dir }}/Tests/scs-compatible-kaas.yaml -v -s KaaS_V1 -V v2 -a kubeconfig={{ kubeconfig_path }}"
changed_when: false
register: scs_compliance_results
always:
Expand Down
9 changes: 8 additions & 1 deletion playbooks/templates/environment.tfvars.j2
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,16 @@ availability_zone = "nova"
external = "ext01"
dns_nameservers = ["62.138.222.111", "62.138.222.222"]
kind_flavor = "SCS-2V:4"
controller_flavor = "SCS-2V:4:20"
controller_flavor = "SCS-2V-4-20s"
worker_flavor = "SCS-2V:4:20"

controller_metadata = {
ps_restart_after_maint = "true"
}

# FIXME: Remove when CI runs on gx-scs2 environment(3+ physical machines for local ssd flavors)
soft_anti_affinity_controller = true

controller_count = 3
worker_count = 3

Expand Down
3 changes: 3 additions & 0 deletions terraform/environments/environment-default.tfvars
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ node_cidr = "<CIDR>" # defaults to "10.8.0.0/20"
service_cidr = "<CIDR>" # defaults to "10.96.0.0/12"
pod_cidr = "<CIDR>" # defaults to "192.168.0.0/16"
anti_affinity = "<boolean>" # defaults to "true"
soft_anti_affinity_controller = "<boolean>" # defaults to "false"
use_cilium = "version/true/false" # defaults to "true", can also be set to "vx.y.z", also see cilium_binaries
use_ovn_lb_provider = "auto/true/false" # use OVN LB if available (auto) or force (true) or never (false)
deploy_nginx_ingress = "version/true/false" # defaults to "true", you can also set vX.Y.Z if you want
Expand All @@ -39,6 +40,8 @@ deploy_cindercsi = "<version>" # defaults to "true", dito
etcd_unsafe_fs = "<boolean>" # defaults to "false", dangerous
testcluster_name = "NAME" # defaults to "testcluster"
restrict_kubeapi = [ "IP/20", "IP/22" ] # defaults to empty (fully open), use [ "none" ] for exclusive internal access
controller_metadata = { metadata_key = "metadata_value" } # defaults to empty dict (no additional metadata)
worker_metadata = { metadata_key = "metadata_value" } # defaults to empty dict (no additional metadata)
containerd_registry_files = {"hosts":["<list of registry host config files>"], "certs":["<list of custom cert files>"]} # defaults to '{"hosts":["./files/containerd/docker.io"], "certs":[]}'
deploy_harbor = "<boolean>" # defaults to "false", "true" deploys Harbor and forces deployment of flux and potentially other services (`cert_manager`, `nginx_ingress` and `cindercsi`), see `doc/usage/harbor.md`
harbor_config = {"domain_name":"<name>", "issuer_email":"<email>", "persistence":"<boolean>", "database_size":"size", "redis_size":"size", "trivy_size":"size"} # for defaults see ../variables.tf
5 changes: 4 additions & 1 deletion terraform/environments/environment-gx-scs-staging.tfvars
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,10 @@ cloud_provider = "gx-scs-staging"
availability_zone = "nova"
external = "ext01"
kind_flavor = "SCS-2V:4"
controller_flavor = "SCS-8V:16:100"
controller_flavor = "SCS-4V-16-100s"
worker_flavor = "SCS-8V:16:100"
#image = "Ubuntu 22.04"
#ssh_username = "ubuntu"
controller_metadata = {
ps_restart_after_maint = "true"
}
5 changes: 4 additions & 1 deletion terraform/environments/environment-gx-scs.tfvars
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,13 @@ cloud_provider = "gx-scs"
availability_zone = "nova"
external = "ext01"
kind_flavor = "SCS-2V:4"
controller_flavor = "SCS-2V:4:20"
controller_flavor = "SCS-2V-4-20s"
worker_flavor = "SCS-2V:4:20"
#image = "Ubuntu 22.04"
#ssh_username = "ubuntu"
#kube_image_raw = "true"
dns_nameservers = ["62.138.222.111", "62.138.222.222"]
#controller_count = 0
controller_metadata = {
ps_restart_after_maint = "true"
}
5 changes: 4 additions & 1 deletion terraform/files/bin/create_cluster.sh
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,10 @@ if test "$CONTROL_PLANE_MACHINE_COUNT" -gt 0 && grep '^ *OPENSTACK_ANTI_AFFINITY
SRVGRP_CONTROLLER=$(echo "$SRVGRP" | grep "${PREFIX}-${CLUSTER_NAME}-controller" | sed 's/^\([0-9a-f\-]*\) .*$/\1/')
SRVGRP_WORKER=$(echo "$SRVGRP" | grep "${PREFIX}-${CLUSTER_NAME}-worker" | sed 's/^\([0-9a-f\-]*\) .*$/\1/')
if test -z "$SRVGRP_CONTROLLER"; then
SRVGRP_CONTROLLER=$(openstack --os-compute-api-version 2.15 server group create --policy anti-affinity -f value -c id ${PREFIX}-${CLUSTER_NAME}-controller)
ANTI_AFFINITY_POLICY_CONTROLLER=anti-affinity
SOFT_ANTI_AFFINITY_CONTROLLER=$(yq eval '.OPENSTACK_SOFT_ANTI_AFFINITY_CONTROLLER' $CCCFG)
if test "$SOFT_ANTI_AFFINITY_CONTROLLER" = "true"; then ANTI_AFFINITY_POLICY_CONTROLLER=soft-anti-affinity; fi
SRVGRP_CONTROLLER=$(openstack --os-compute-api-version 2.15 server group create --policy ${ANTI_AFFINITY_POLICY_CONTROLLER} -f value -c id ${PREFIX}-${CLUSTER_NAME}-controller)
SRVGRP_WORKER=$(openstack --os-compute-api-version 2.15 server group create --policy soft-anti-affinity -f value -c id ${PREFIX}-${CLUSTER_NAME}-worker)
fi
echo "Adding server groups $SRVGRP_CONTROLLER and $SRVGRP_WORKER to $CCCFG"
Expand Down
3 changes: 3 additions & 0 deletions terraform/files/bin/deploy_cluster_api.sh
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,9 @@ clusterctl version --output yaml
#MTU=`yq eval '.MTU_VALUE' ~/cluster-defaults/clusterctl.yaml`
# Fix up nameserver list (trailing comma -- cosmetic)
sed '/OPENSTACK_DNS_NAMESERVERS:/s@, \]"@ ]"@' -i ~/cluster-defaults/clusterctl.yaml
# Fix metadata dicts (trailing comma -- cosmetic)
sed '/OPENSTACK_CONTROL_PLANE_MACHINE_METADATA:/s@, }"@ }"@' -i ~/cluster-defaults/clusterctl.yaml
sed '/OPENSTACK_NODE_MACHINE_METADATA:/s@, }"@ }"@' -i ~/cluster-defaults/clusterctl.yaml

# cp clusterctl.yaml to the right place
if test "$(dotversion "$(clusterctl version -o short)")" -ge 10500; then
Expand Down
2 changes: 1 addition & 1 deletion terraform/files/bin/openstack-kube-versions.inc
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
# (c) Kurt Garloff <kurt@garloff.de>, 3/2022
# SPDX-License-Identifier: Apache-2.0
# Images from https://swift.services.a.regiocloud.tech/swift/v1/AUTH_b182637428444b9aa302bb8d5a5a418c/openstack-k8s-capi-images
k8s_versions=("v1.21.14" "v1.22.17" "v1.23.16" "v1.24.15" "v1.25.15" "v1.26.14" "v1.27.12" "v1.28.10" "v1.29.3")
k8s_versions=("v1.21.14" "v1.22.17" "v1.23.16" "v1.24.15" "v1.25.15" "v1.26.14" "v1.27.12" "v1.28.11" "v1.29.3")
# OCCM, CCM-RBAC, Cinder CSI, Cinder-Snapshot (TODO: Manila CSI)
occm_versions=("v1.21.1" "v1.22.2" "v1.23.4" "v1.24.6" "v1.25.6" "v1.26.4" "v1.27.3" "v1.28.2" "v1.29.0")
#ccmr_versions=("" "v1.22.2" "v1.23.4" "v1.24.6" "v1.25.6" "v1.26.4" "v1.27.3" "v1.28.2" "v1.29.0")
Expand Down
2 changes: 2 additions & 0 deletions terraform/files/template/cluster-template.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -320,6 +320,7 @@ spec:
template:
spec:
flavor: ${OPENSTACK_CONTROL_PLANE_MACHINE_FLAVOR}
serverMetadata: ${OPENSTACK_CONTROL_PLANE_MACHINE_METADATA}
serverGroupID: ${OPENSTACK_SRVGRP_CONTROLLER}
image: ${OPENSTACK_IMAGE_NAME}
sshKeyName: ${OPENSTACK_SSH_KEY_NAME}
Expand All @@ -345,6 +346,7 @@ spec:
name: ${CLUSTER_NAME}-cloud-config
kind: Secret
flavor: ${OPENSTACK_NODE_MACHINE_FLAVOR}
serverMetadata: ${OPENSTACK_NODE_MACHINE_METADATA}
serverGroupID: ${OPENSTACK_SRVGRP_WORKER}
image: ${OPENSTACK_IMAGE_NAME}
sshKeyName: ${OPENSTACK_SSH_KEY_NAME}
Expand Down
5 changes: 5 additions & 0 deletions terraform/files/template/clusterctl.yaml.tmpl
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,10 @@ DEPLOY_FLUX: ${deploy_flux}
# deploy metrics service
DEPLOY_METRICS: ${deploy_metrics}

# OpenStack instance additional metadata
OPENSTACK_CONTROL_PLANE_MACHINE_METADATA: "{ %{ for metadata_key, metadata_value in controller_metadata ~} ${metadata_key}: '${metadata_value}', %{ endfor ~} }"
OPENSTACK_NODE_MACHINE_METADATA: "{ %{ for metadata_key, metadata_value in worker_metadata ~} ${metadata_key}: '${metadata_value}', %{ endfor ~} }"

# OpenStack flavors and machine count
OPENSTACK_CONTROL_PLANE_MACHINE_FLAVOR: ${controller_flavor}
CONTROL_PLANE_MACHINE_COUNT: ${controller_count}
Expand Down Expand Up @@ -80,6 +84,7 @@ OPENSTACK_SSH_KEY_NAME: ${prefix}-keypair

# Use anti-affinity server groups
OPENSTACK_ANTI_AFFINITY: ${anti_affinity}
OPENSTACK_SOFT_ANTI_AFFINITY_CONTROLLER: ${soft_anti_affinity_controller}
OPENSTACK_SRVGRP_CONTROLLER: nonono
OPENSTACK_SRVGRP_WORKER: nonono

Expand Down
5 changes: 4 additions & 1 deletion terraform/mgmtcluster.tf
Original file line number Diff line number Diff line change
Expand Up @@ -313,11 +313,13 @@ resource "terraform_data" "mgmtcluster_bootstrap_files" {
provisioner "file" {
content = templatefile("files/template/clusterctl.yaml.tmpl", {
anti_affinity = var.anti_affinity,
soft_anti_affinity_controller = var.soft_anti_affinity_controller,
availability_zone = var.availability_zone,
capo_instance_create_timeout = var.capo_instance_create_timeout,
cloud_provider = var.cloud_provider,
controller_count = var.controller_count,
controller_flavor = var.controller_flavor,
controller_metadata = var.controller_metadata,
deploy_cert_manager = var.deploy_cert_manager,
deploy_cindercsi = var.deploy_cindercsi,
deploy_flux = var.deploy_flux,
Expand All @@ -340,7 +342,8 @@ resource "terraform_data" "mgmtcluster_bootstrap_files" {
calico_version = var.calico_version,
use_ovn_lb_provider = var.use_ovn_lb_provider,
worker_count = var.worker_count,
worker_flavor = var.worker_flavor
worker_flavor = var.worker_flavor,
worker_metadata = var.worker_metadata
})
destination = "/home/${var.ssh_username}/cluster-defaults/clusterctl.yaml"
}
Expand Down
18 changes: 18 additions & 0 deletions terraform/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,18 @@ variable "worker_flavor" {
default = "SCS-2V-4-20s"
}

variable "controller_metadata" {
description = "additional metadata for instances running the k8s management nodes"
type = map(string)
default = {}
}

variable "worker_metadata" {
description = "additional metadata for instances running the k8s worker nodes"
type = map(string)
default = {}
}

variable "availability_zone" {
description = "availability zone for openstack resources"
type = string
Expand Down Expand Up @@ -191,6 +203,12 @@ variable "anti_affinity" {
default = true
}

variable "soft_anti_affinity_controller" {
description = "allow the use of soft-anti-affinity for the control plane"
type = bool
default = false
}

variable "dns_nameservers" {
description = "array of nameservers to be set for subnets, prefer local DNS servers if available"
type = list(string)
Expand Down