Skip to content

Commit 0d14457

Browse files
author
Steven Smith
committed
Adds live migration procedures for sdn to ovnk
1 parent d00d29b commit 0d14457

10 files changed

+478
-23
lines changed
Lines changed: 102 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,102 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * networking/ovn_kubernetes_network_provider/migrate-from-openshift-sdn.adoc
4+
5+
ifeval::["{context}" == "migrate-to-openshift-sdn"]
6+
:sdn: OpenShift SDN
7+
:previous-sdn: OVN-Kubernetes
8+
:type: OpenShiftSDN
9+
endif::[]
10+
ifeval::["{context}" == "migrate-from-openshift-sdn"]
11+
:sdn: OVN-Kubernetes
12+
:previous-sdn: OpenShift SDN
13+
:type: OVNKubernetes
14+
endif::[]
15+
16+
[id="how-the-live-migration-process-works_{context}"]
17+
= How the live migration process works
18+
19+
The following table summarizes the live migration process by segmenting between the user-initiated steps in the process and the actions that the migration script performs in response.
20+
21+
.Live migration to OVNKubernetes from OpenShiftSDN
22+
[cols="1,1a",options="header"]
23+
|===
24+
|User-initiated steps|Migration activity
25+
ifdef::openshift-rosa,openshift-dedicated[]
26+
| Add the `unsupported-red-hat-internal-testing` annotation to the cluster-level network configuration.
27+
| The Cluster Network Operator (CNO) acknowledges the unsupported testing environment.
28+
endif::[]
29+
30+
| Patch the cluster-level networking configuration by changing the `networkType` from `OpenShiftSDN` to `OVNKubernetes`.
31+
|
32+
Cluster Network Operator (CNO)::
33+
+
34+
--
35+
* Sets migration-related fields in the `network.operator` custom resource (CR) and waits for routable MTUs to be applied to all nodes.
36+
* Patches the `network.operator` CR to set the migration mode to `Live` for OVN-Kubernetes and deploys the OpenShift SDN network plugin in migration mode.
37+
* Deploys OVN-Kubernetes with hybrid overlay enabled, ensuring that no racing conditions occur.
38+
* Waits for the OVN-Kubernetes deployment and updates the conditions in the status of the `network.config` CR.
39+
* Triggers the Machine Config Operator (MCO) to apply the new machine config to each machine config pool, which includes node cordoning, draining, and rebooting.
40+
* OVN-Kubernetes adds nodes to the appropriate zones and recreates pods using OVN-Kubernetes as the default CNI plugin.
41+
* Removes migration-related fields from the network.operator CR and performs cleanup actions, such as deleting OpenShift SDN resources and redeploying OVN-Kubernetes in normal mode with the necessary configurations.
42+
* Waits for the OVN-Kubernetes redeployment and updates the status conditions in the `network.config` CR to indicate migration completion. If your migration is blocked, see "Checking live migration metrics" for information on troubleshooting the issue.
43+
--
44+
|===
45+
46+
////
47+
ifeval::["{context}" == "migrate-from-openshift-sdn"]
48+
If a rollback to OpenShift SDN is required, the following table describes the process.
49+
50+
[IMPORTANT]
51+
====
52+
You must wait until the migration process from OpenShift SDN to OVN-Kubernetes network plugin is successful before initiating a rollback.
53+
====
54+
55+
.Performing a rollback to OpenShift SDN
56+
[cols="1,1a",options="header"]
57+
|===
58+
59+
|User-initiated steps|Migration activity
60+
61+
|Suspend the MCO to ensure that it does not interrupt the migration.
62+
|The MCO stops.
63+
64+
|
65+
Set the `migration` field of the `Network.operator.openshift.io` custom resource (CR) named `cluster` to `OpenShiftSDN`. Make sure the `migration` field is `null` before setting it to a value.
66+
|
67+
CNO:: Updates the status of the `Network.config.openshift.io` CR named `cluster` accordingly.
68+
69+
|Update the `networkType` field.
70+
|
71+
CNO:: Performs the following actions:
72+
+
73+
--
74+
* Destroys the OVN-Kubernetes control plane pods.
75+
* Deploys the OpenShift SDN control plane pods.
76+
* Updates the Multus objects to reflect the new network plugin.
77+
--
78+
79+
|
80+
Reboot each node in the cluster.
81+
|
82+
Cluster:: As nodes reboot, the cluster assigns IP addresses to pods on the OpenShift-SDN network.
83+
84+
|
85+
Enable the MCO after all nodes in the cluster reboot.
86+
|
87+
MCO:: Rolls out an update to the systemd configuration necessary for OpenShift SDN; the MCO updates a single machine per pool at a time by default, so the total time the migration takes increases with the size of the cluster.
88+
89+
|===
90+
endif::[]
91+
92+
////
93+
94+
ifdef::sdn[]
95+
:!sdn:
96+
endif::[]
97+
ifdef::previous-sdn[]
98+
:!previous-sdn:
99+
endif::[]
100+
ifdef::type[]
101+
:!type:
102+
endif::[]
Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * networking/ovn_kubernetes_network_provider/migrate-from-openshift-sdn.adoc
4+
5+
:_mod-docs-content-type: PROCEDURE
6+
[id="live-migration-metrics-information"]
7+
= Information about live migration metrics
8+
9+
The following table shows you the available metrics and the label values populated from the `openshift_network_operator_live_migration_procedure` expression. Use this information to monitor progress or to troubleshoot the migration.
10+
11+
12+
.Live migration metrics
13+
[cols="1a,1a",options="header"]
14+
|===
15+
| Metric | Label values
16+
|
17+
*`openshift_network_operator_live_migration_blocked:`*::
18+
+
19+
--
20+
A Prometheus gauge vector metric. A metric that contains a constant `1` value labeled with the reason that the CNI live migration might not have started. This metric is available when the CNI live migration has started by annotating the `Network` custom resource. +
21+
This metric is not published unless the live migration is blocked.
22+
--
23+
|
24+
The list of label values includes the following::
25+
+
26+
--
27+
* `UnsupportedCNI`: Unable to migrate to the unsupported target CNI. Valid CNI is `OVNKubernetes` when migrating from OpenShift SDN.
28+
* `UnsupportedHyperShiftCluster`: Live migration is unsupported within an HCP cluster.
29+
* `UnsupportedSDNNetworkIsolationMode`: OpenShift SDN is configured with an unsupported network isolation mode `Multitenant`. Migrate to a supported network isolation mode before performing live migration.
30+
* `UnsupportedMACVLANInterface`: Remove the egress router or any pods which contain the pod annotation `pod.network.openshift.io/assign-macvlan`.
31+
Find the offending pod's namespace or pod name with the following command: +
32+
+
33+
`oc get pods -Ao=jsonpath='{range .items[?(@.metadata.annotations.pod\.network\.openshift\.io/assign-macvlan=="")]}{@.metadata.namespace}{"\t"}{@.metadata.name}{"\n"}'`.
34+
--
35+
36+
|
37+
*`openshift_network_operator_live_migration_condition:`*::
38+
+
39+
--
40+
A metric which represents the status of each condition type for the CNI live migration. The set of status condition types is defined for `network.config` to support observability of the CNI live migration. +
41+
A `1` value represents condition status `true`. A `0` value represents `false`. `-1` represents unknown. This metric is available when the CNI live migration has started by annotating the `Network` custom resource (CR). +
42+
This metric is only available when the live migration has been triggered by adding the relevant annotation to the `Network` CR cluster, otherwise, it is not published. If the following condition types are not present within the Network CR cluster, the metric and their labels are cleared.
43+
--
44+
|
45+
The list of label values includes the following::
46+
+
47+
--
48+
* `NetworkTypeMigrationInProgress`
49+
* `NetworkTypeMigrationTargetCNIAvailable`
50+
* `NetworkTypeMigrationTargetCNIInUse`
51+
* `NetworkTypeMigrationOriginalCNIPurged`
52+
* `NetworkTypeMigrationMTUReady`
53+
--
54+
|===

modules/nw-network-plugin-migration-process.adoc

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -15,11 +15,11 @@ ifeval::["{context}" == "migrate-from-openshift-sdn"]
1515
endif::[]
1616

1717
[id="how-the-migration-process-works_{context}"]
18-
= How the migration process works
18+
= How the offline migration process works
1919

2020
The following table summarizes the migration process by segmenting between the user-initiated steps in the process and the actions that the migration performs in response.
2121

22-
.Migrating to {sdn} from {previous-sdn}
22+
.Offline migration to {sdn} from {previous-sdn}
2323
[cols="1,1a",options="header"]
2424
|===
2525

@@ -38,7 +38,7 @@ CNO:: Performs the following actions:
3838
--
3939
* Destroys the {previous-sdn} control plane pods.
4040
* Deploys the {sdn} control plane pods.
41-
* Updates the Multus objects to reflect the new network plugin.
41+
* Updates the Multus daemon sets and config map objects to reflect the new network plugin.
4242
--
4343

4444
|
@@ -48,6 +48,7 @@ Cluster:: As nodes reboot, the cluster assigns IP addresses to pods on the {sdn}
4848

4949
|===
5050

51+
////
5152
ifeval::["{context}" == "migrate-from-openshift-sdn"]
5253
If a rollback to OpenShift SDN is required, the following table describes the process.
5354
@@ -92,6 +93,7 @@ MCO:: Rolls out an update to the systemd configuration necessary for OpenShift S
9293
9394
|===
9495
endif::[]
96+
////
9597

9698
ifdef::sdn[]
9799
:!sdn:
Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * networking/ovn_kubernetes_network_provider/migrate-from-openshift-sdn.adoc
4+
5+
:_mod-docs-content-type: PROCEDURE
6+
[id="checking-live-migration-metrics"]
7+
= Checking live migration metrics
8+
9+
Metrics are available to monitor the progress of the live migration. Metrics can be viewed on the {product-title} web console, or by using the `oc` CLI.
10+
11+
.Prerequisites
12+
13+
* You have initiated a live migration to OVN-Kubernetes.
14+
15+
.Procedure
16+
17+
. To view live migration metrics on the {product-title} web console:
18+
19+
.. Click *Observe* -> *Metrics*.
20+
21+
.. In the *Expression* box, type *openshift_network* and click the *openshift_network_operator_live_migration_procedure* option.
22+
23+
. To view metrics by using the `oc` CLI:
24+
25+
.. Enter the following command to generate a token for the `prometheus-k8s` service account in the `openshift-monitoring` namespace:
26+
+
27+
[source,terminal]
28+
----
29+
$ oc create token prometheus-k8s -n openshift-monitoring
30+
----
31+
+
32+
.Example output
33+
+
34+
[source,terminal]
35+
----
36+
eyJhbGciOiJSUzI1NiIsImtpZCI6IlZiSUtwclcwbEJ2VW9We...
37+
----
38+
39+
.. Enter the following command to request information about the `openshift_network_operator_live_migration_condition` metric:
40+
+
41+
[source,terminal]
42+
----
43+
$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -k -H "Authorization: <eyJhbGciOiJSUzI1NiIsImtpZCI6IlZiSUtwclcwbEJ2VW9We...>" "https://<openshift_API_endpoint>" --data-urlencode "query=openshift_network_operator_live_migration_condition" | jq`
44+
----
45+
+
46+
.Example output
47+
+
48+
[source,terminal]
49+
----
50+
"status": "success",
51+
"data": {
52+
"resultType": "vector",
53+
"result": [
54+
{
55+
"metric": {
56+
"__name__": "openshift_network_operator_live_migration_condition",
57+
"container": "network-operator",
58+
"endpoint": "metrics",
59+
"instance": "10.0.83.62:9104",
60+
"job": "metrics",
61+
"namespace": "openshift-network-operator",
62+
"pod": "network-operator-6c87754bc6-c8qld",
63+
"prometheus": "openshift-monitoring/k8s",
64+
"service": "metrics",
65+
"type": "NetworkTypeMigrationInProgress"
66+
},
67+
"value": [
68+
1717653579.587,
69+
"1"
70+
]
71+
},
72+
...
73+
----
74+
75+
The table in "Information about live migration metrics" shows you the available metrics and the label values populated from the `openshift_network_operator_live_migration_procedure` expression. Use this information to monitor progress or to troubleshoot the migration.
Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * networking/ovn_kubernetes_network_provider/migrate-from-openshift-sdn.adoc
4+
5+
[id="nw-ovn-kubernetes-live-migration-about_{context}"]
6+
= Live migration to the OVN-Kubernetes network plugin overview
7+
8+
The live migration method is the process in which the OpenShift SDN network plugin and its network configurations, connections, and associated resources, are migrated to the OVN-Kubernetes network plugin without service interruption. It is available for {product-title}, {product-dedicated}, {product-rosa}, and Azure Red Hat OpenShift deployment types. It is not available for HyperShift deployment types. This migration method is valuable for deployment types that require constant service availability and offers the following benefits:
9+
10+
* Continuous service availability
11+
* Minimized downtime
12+
* Automatic node rebooting
13+
* Seamless transition from the OpenShift SDN network plugin to the OVN-Kubernetes network plugin
14+
15+
Although a rollback procedure is provided, the live migration is intended to be a one-way process.
16+
17+
include::snippets/sdn-deprecation-statement.adoc[]
18+
19+
The following sections provide more information about the live migration method.
20+
21+
[id="supported-platforms-live-migrating-ovn-kubernetes"]
22+
== Supported platforms when using the live migration method
23+
24+
The following table provides information about the supported platforms for the live migration type.
25+
26+
.Supported platforms for the live migration method
27+
[cols="1,1", options="header"]
28+
|===
29+
| Platform | Live Migration
30+
31+
| Bare metal hardware (IPI and UPI) |&#10003;
32+
| Amazon Web Services (AWS) (IPI and UPI) |&#10003;
33+
| Google Cloud Platform (GCP) (IPI and UPI) |&#10003;
34+
| {ibm-cloud-name} (IPI and UPI) |&#10003;
35+
| Microsoft Azure (IPI and UPI) |&#10003;
36+
| {rh-openstack-first} (IPI and UPI) |&#10003;
37+
| VMware vSphere (IPI and UPI) |&#10003;
38+
| AliCloud (IPI and UPI) |&#10003;
39+
| Nutanix (IPI and UPI) |&#10003;
40+
|===
41+
42+
[id="considerations-live-migrating-ovn-kubernetes-network-provider_{context}"]
43+
== Considerations for live migration to the OVN-Kubernetes network plugin
44+
45+
Before using the live migration method to the OVN-Kubernetes network plugin, cluster administrators should consider the following information:
46+
47+
* The live migration procedure is unsupported for clusters with OpenShift SDN multitenant mode enabled.
48+
49+
* Egress router pods block the live migration process. They must be removed before beginning the live migration process.
50+
51+
* During the live migration, multicast, egress IP addresses, and egress firewalls are temporarily disabled. They can be migrated from OpenShift SDN to OVN-Kubernetes after the live migration process has finished.
52+
53+
* The migration is intended to be a one-way process. However, for users that want to rollback to OpenShift-SDN, migration from OpenShift-SDN to OVN-Kubernetes must have succeeded. Users can follow the same procedure below to migrate to the OpenShift SDN network plugin from the OVN-Kubernetes network plugin.
54+
55+
* The live migration is not supported on HyperShift clusters.
56+
57+
* OpenShift SDN does not support IPsec. After the migration, cluster administrators can enable IPsec.
58+
59+
* OpenShift SDN does not support IPv6. After the migration, cluster administrators can enable dual-stack.
60+
61+
* The cluster MTU is the MTU value for pod interfaces. It is always less than your hardware MTU to account for the cluster network overlay overhead. The overhead is 100 bytes for OVN-Kubernetes and 50 bytes for OpenShift SDN.
62+
+
63+
During the live migration, both OVN-Kubernetes and OpenShift SDN run in parallel. OVN-Kubernetes manages the cluster network of some nodes, while OpenShift SDN manages the cluster network of others. To ensure that cross-CNI traffic remains functional, the Cluster Network Operator updates the routable MTU to ensure that both CNIs share the same overlay MTU. As a result, after the migration has completed, the cluster MTU is 50 bytes less.
64+
65+
* Some parameters of OVN-Kubernetes cannot be changed after installation. The following parameters can be set only before starting the live migration:
66+
67+
** `InternalTransitSwitchSubnet`
68+
** `internalJoinSubnet`
69+
70+
* Unless otherwise configured, OVN-Kubernetes uses the following IP address ranges:
71+
** `100.64.0.0/1`. This IP address range is used for the `internalJoinSubnet` parameter of OVN-Kubernetes by default. If this IP address range is already in use, enter the following command to update it to `100.63.0.0/16`:
72+
+
73+
[source,terminal]
74+
----
75+
$ oc patch network.operator.openshift.io cluster --type='merge' -p='{"spec":{"defaultNetwork":{"ovnKubernetesConfig":{"ipv4":{"internalJoinSubnet": "100.63.0.0/16"}}}}}'
76+
----
77+
** `100.88.0.0/16`. This IP address range is used for the `internalTransSwitchSubnet` parameter of OVN-Kubernetes by default. If this IP address range is already in use by another network, enter the following command to update it to `100.99.0.0/16`:
78+
+
79+
[source,terminal]
80+
----
81+
$ oc patch network.operator.openshift.io cluster --type='merge' -p='{"spec":{"defaultNetwork":{"ovnKubernetesConfig":{"ipv4":{"internalTransitSwitchSubnet": "100.99.0.0/16"}}}}}'
82+
----
83+
84+
* In most cases, the live migration is independent of the secondary interfaces of pods created by the Multus CNI plugin. However, if these secondary interfaces were set up on the default network interface controller (NIC) of the host, for example, using MACVLAN, IPVLAN, SR-IOV, or bridge interfaces with the default NIC as the control node, OVN-Kubernetes might encounter malfunctions. Users should remove such configurations before proceeding with the live migration.
85+
86+
* When there are multiple NICs inside of the host, and the default route is not on the interface that has the Kubernetes NodeIP, you must use the offline migration instead.
87+
88+
* All `DaemonSet` objects in the `openshift-sdn` namespace, which are not managed by the Cluster Network Operator (CNO), must be removed before initiating the live migration. These unmanaged daemon sets can cause the migration status to remain incomplete if not properly handled.

0 commit comments

Comments
 (0)