Skip to content

Commit 70a1027

Browse files
authored
Merge pull request #70208 from JoeAldinger/OSDOCS-9015
/lgtm OSDOCS-9015:osp local disk deployment day 2
2 parents 84c157d + 71a4a0c commit 70a1027

File tree

3 files changed

+315
-0
lines changed

3 files changed

+315
-0
lines changed

_topic_maps/_topic_map.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -463,6 +463,8 @@ Topics:
463463
File: installing-openstack-installer-restricted
464464
- Name: OpenStack Cloud Controller Manager reference guide
465465
File: installing-openstack-cloud-config-reference
466+
- Name: Deploying on OpenStack with rootVolume and etcd on local disk
467+
File: deploying-openstack-with-rootVolume-etcd-on-local-disk
466468
# - Name: Load balancing deployments on OpenStack
467469
# File: installing-openstack-load-balancing
468470
- Name: Uninstalling a cluster on OpenStack
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
:_mod-docs-content-type: ASSEMBLY
2+
[id="deploying-openstack-on-local-disk"]
3+
= Deploying on OpenStack with rootVolume and etcd on local disk
4+
include::_attributes/common-attributes.adoc[]
5+
:context: deploying-openstack-on-local-disk
6+
7+
toc::[]
8+
9+
:FeatureName: Deploying on {rh-openstack-first} with rootVolume and etcd on local disk
10+
include::snippets/technology-preview.adoc[]
11+
12+
As a day 2 operation, you can resolve and prevent performance issues of your {rh-openstack-first} installation by moving etcd from a root volume (provided by OpenStack Cinder) to a dedicated ephemeral local disk.
13+
14+
include::modules/installation-osp-local-disk-deployment.adoc[leveloffset=+1]
15+
16+
[role="_additional-resources"]
17+
.Additional resources
18+
* xref:../../scalability_and_performance/recommended-performance-scale-practices/recommended-etcd-practices.adoc#recommended-etcd-practices[Recommended etcd practices]
Lines changed: 295 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,295 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * installing_openstack/deploying-openstack-with-rootVolume-etcd-on-local-disk.adoc
4+
5+
:_mod-docs-content-type: Procedure
6+
[id="installation-osp-local-disk-deployment_{context}"]
7+
= Deploying {rh-openstack} on local disk
8+
9+
.Prerequisites
10+
11+
* You have an OpenStack cloud with a working Cinder.
12+
13+
* Your OpenStack cloud has at least 75 GB of available storage to accommodate 3 root volumes for the OpenShift control plane.
14+
15+
* The OpenStack cloud is deployed with Nova ephemeral storage that uses a local storage backend and not `rbd`.
16+
17+
18+
.Procedure
19+
20+
[WARNING]
21+
====
22+
This procedure is for testing etcd on a local disk only and should not be used on production clusters. In certain cases, complete loss of the control plane can occur. For more information, see "Overview of backup and restore operation" under "Backup and restore".
23+
====
24+
25+
. Create a Nova flavor for the control plane with at least 10 GB of ephemeral disk by running the following command, replacing the values for `--ram`, `--disk`, and <flavor_name> based on your environment:
26+
+
27+
[source,terminal]
28+
----
29+
$ openstack flavor create --<ram 16384> --<disk 0> --ephemeral 10 --vcpus 4 <flavor_name>
30+
----
31+
32+
. Deploy a cluster with root volumes for the control plane; for example:
33+
+
34+
.Example YAML file
35+
[source,yaml]
36+
----
37+
# ...
38+
controlPlane:
39+
name: master
40+
platform:
41+
openstack:
42+
type: ${CONTROL_PLANE_FLAVOR}
43+
rootVolume:
44+
size: 25
45+
types:
46+
- ${CINDER_TYPE}
47+
replicas: 3
48+
# ...
49+
----
50+
51+
. Deploy the cluster you created by running the following command:
52+
+
53+
[source,terminal]
54+
----
55+
$ openshift-install create cluster --dir <installation_directory> <1>
56+
----
57+
+
58+
<1> For `<installation_directory>`, specify the location of the customized `./install-config.yaml file` that you previously created.
59+
+
60+
61+
. Verify that the cluster you deployed is healthy before proceeding to the next step by running the following command:
62+
+
63+
[source,terminal]
64+
----
65+
$ oc wait clusteroperators --all --for=condition=Progressing=false <1>
66+
----
67+
+
68+
<1> Ensures that the cluster operators are finished progressing and that the cluster is not deploying or updating.
69+
70+
. Edit the `ControlPlaneMachineSet` (CPMS) to add the additional block ephemeral device that is used by etcd by running the following command:
71+
+
72+
[%collapsible]
73+
====
74+
[source,terminal]
75+
----
76+
$ oc patch ControlPlaneMachineSet/cluster -n openshift-machine-api --type json -p ' <1>
77+
[
78+
{
79+
"op": "add",
80+
"path": "/spec/template/machines_v1beta1_machine_openshift_io/spec/providerSpec/value/additionalBlockDevices", <2>
81+
"value": [
82+
{
83+
"name": "etcd",
84+
"sizeGiB": 10,
85+
"storage": {
86+
"type": "Local" <3>
87+
}
88+
}
89+
]
90+
}
91+
]
92+
'
93+
----
94+
<1> Applies the JSON patch to the `ControlPlaneMachineSet` custom resource (CR).
95+
<2> Specifies the path where the `additionalBlockDevices` are added.
96+
<3> Adds the etcd devices with at least local storage of 10 GB to the cluster. You can specify values greater than 10 GB as long as the etcd device fits the Nova flavor. For example, if the Nova flavor has 15 GB, you can create the etcd device with 12 GB.
97+
====
98+
99+
. Verify that the control plane machines are healthy by using the following steps:
100+
101+
.. Wait for the control plane machine set update to finish by running the following command:
102+
103+
+
104+
[source,terminal]
105+
----
106+
$ oc wait --timeout=90m --for=condition=Progressing=false controlplanemachineset.machine.openshift.io -n openshift-machine-api cluster
107+
----
108+
109+
.. Verify that the 3 control plane machine sets are updated by running the following command:
110+
111+
+
112+
[source,terminal]
113+
----
114+
$ oc wait --timeout=90m --for=jsonpath='{.status.updatedReplicas}'=3 controlplanemachineset.machine.openshift.io -n openshift-machine-api cluster
115+
----
116+
117+
.. Verify that the 3 control plane machine sets are healthy by running the following command:
118+
119+
+
120+
[source,terminal]
121+
----
122+
$ oc wait --timeout=90m --for=jsonpath='{.status.replicas}'=3 controlplanemachineset.machine.openshift.io -n openshift-machine-api cluster
123+
----
124+
125+
.. Verify that the `ClusterOperators` are not progressing in the cluster by running the following command:
126+
127+
+
128+
[source,terminal]
129+
----
130+
$ oc wait clusteroperators --timeout=30m --all --for=condition=Progressing=false
131+
----
132+
133+
.. Verify that each of the 3 control plane machines has the additional block device you previously created by running the following script:
134+
135+
+
136+
[source,bash]
137+
----
138+
$ cp_machines=$(oc get machines -n openshift-machine-api --selector='machine.openshift.io/cluster-api-machine-role=master' --no-headers -o custom-columns=NAME:.metadata.name) //<1>
139+
140+
141+
if [[ $(echo "${cp_machines}" | wc -l) -ne 3 ]]; then
142+
exit 1
143+
fi //<2>
144+
145+
146+
for machine in ${cp_machines}; do
147+
if ! oc get machine -n openshift-machine-api "${machine}" -o jsonpath='{.spec.providerSpec.value.additionalBlockDevices}' | grep -q 'etcd'; then
148+
exit 1
149+
fi //<3>
150+
done
151+
----
152+
<1> Retrieves the control plane machines running in the cluster.
153+
<2> Iterates over machines which have an `additionalBlockDevices` entry with the name `etcd`.
154+
<3> Outputs the name of every control plane machine which has an `additionalBlockDevice` named `etcd`.
155+
156+
. Create a file named `98-var-lib-etcd.yaml` by using the following YAML file:
157+
158+
[WARNING]
159+
====
160+
This procedure is for testing etcd on a local disk and should not be used on a production cluster. In certain cases, complete loss of the control plane can occur. For more information, see "Overview of backup and restore operation" under "Backup and restore".
161+
====
162+
163+
164+
[%collapsible]
165+
====
166+
[source,yaml]
167+
----
168+
apiVersion: machineconfiguration.openshift.io/v1
169+
kind: MachineConfig
170+
metadata:
171+
labels:
172+
machineconfiguration.openshift.io/role: master
173+
name: 98-var-lib-etcd
174+
spec:
175+
config:
176+
ignition:
177+
version: 3.4.0
178+
systemd:
179+
units:
180+
- contents: |
181+
[Unit]
182+
Description=Mount local-etcd to /var/lib/etcd
183+
184+
[Mount]
185+
What=/dev/disk/by-label/local-etcd #<1>
186+
Where=/var/lib/etcd
187+
Type=xfs
188+
Options=defaults,prjquota
189+
190+
[Install]
191+
WantedBy=local-fs.target
192+
enabled: true
193+
name: var-lib-etcd.mount
194+
- contents: |
195+
[Unit]
196+
Description=Create local-etcd filesystem
197+
DefaultDependencies=no
198+
After=local-fs-pre.target
199+
ConditionPathIsSymbolicLink=!/dev/disk/by-label/local-etcd #<2>
200+
201+
[Service]
202+
Type=oneshot
203+
RemainAfterExit=yes
204+
ExecStart=/bin/bash -c "[ -L /dev/disk/by-label/ephemeral0 ] || ( >&2 echo Ephemeral disk does not exist; /usr/bin/false )"
205+
ExecStart=/usr/sbin/mkfs.xfs -f -L local-etcd /dev/disk/by-label/ephemeral0 #<3>
206+
207+
[Install]
208+
RequiredBy=dev-disk-by\x2dlabel-local\x2detcd.device
209+
enabled: true
210+
name: create-local-etcd.service
211+
- contents: |
212+
[Unit]
213+
Description=Migrate existing data to local etcd
214+
After=var-lib-etcd.mount
215+
Before=crio.service #<4>
216+
217+
Requisite=var-lib-etcd.mount
218+
ConditionPathExists=!/var/lib/etcd/member
219+
ConditionPathIsDirectory=/sysroot/ostree/deploy/rhcos/var/lib/etcd/member #<5>
220+
221+
[Service]
222+
Type=oneshot
223+
RemainAfterExit=yes
224+
225+
ExecStart=/bin/bash -c "if [ -d /var/lib/etcd/member.migrate ]; then rm -rf /var/lib/etcd/member.migrate; fi" #<6>
226+
227+
ExecStart=/usr/bin/cp -aZ /sysroot/ostree/deploy/rhcos/var/lib/etcd/member/ /var/lib/etcd/member.migrate
228+
ExecStart=/usr/bin/mv /var/lib/etcd/member.migrate /var/lib/etcd/member #<7>
229+
230+
[Install]
231+
RequiredBy=var-lib-etcd.mount
232+
enabled: true
233+
name: migrate-to-local-etcd.service
234+
- contents: |
235+
[Unit]
236+
Description=Relabel /var/lib/etcd
237+
238+
After=migrate-to-local-etcd.service
239+
Before=crio.service
240+
241+
[Service]
242+
Type=oneshot
243+
RemainAfterExit=yes
244+
245+
ExecCondition=/bin/bash -c "[ -n \"$(restorecon -nv /var/lib/etcd)\" ]" #<8>
246+
247+
ExecStart=/usr/sbin/restorecon -R /var/lib/etcd
248+
249+
[Install]
250+
RequiredBy=var-lib-etcd.mount
251+
enabled: true
252+
name: relabel-var-lib-etcd.service
253+
----
254+
<1> The etcd database must be mounted by the device, not a label, to ensure that `systemd` generates the device dependency used in this config to trigger filesystem creation.
255+
<2> Do not run if the file system `dev/disk/by-label/local-etcd` already exists.
256+
<3> Fails with an alert message if `/dev/disk/by-label/ephemeral0` doesn't exist.
257+
<4> Migrates existing data to local etcd database. This config does so after `/var/lib/etcd` is mounted, but before CRI-O starts so etcd is not running yet.
258+
<5> Requires that etcd is mounted and does not contain a member directory, but the ostree does.
259+
<6> Cleans up any previous migration state.
260+
<7> Copies and moves in separate steps to ensure atomic creation of a complete member directory.
261+
<8> Performs a quick check of the mount point directory before performing a full recursive relabel. If restorecon in the file path `/var/lib/etcd` cannot rename the directory, the recursive rename is not performed.
262+
====
263+
264+
. Create the new `MachineConfig` object by running the following command:
265+
+
266+
[source,terminal]
267+
----
268+
$ oc create -f 98-var-lib-etcd.yaml
269+
----
270+
+
271+
[NOTE]
272+
====
273+
Moving the etcd database onto the local disk of each control plane machine takes time.
274+
====
275+
276+
. Verify that the etcd databases has been transferred to the local disk of each control plane by running the following commands:
277+
+
278+
.. Verify that the cluster is still updating by running the following command:
279+
+
280+
[source,terminal]
281+
----
282+
$ oc wait --timeout=45m --for=condition=Updating=false machineconfigpool/master
283+
----
284+
.. Verify that the cluster is ready by running the following command:
285+
+
286+
[source,terminal]
287+
----
288+
$ oc wait node --selector='node-role.kubernetes.io/master' --for condition=Ready --timeout=30s
289+
----
290+
.. Verify that the cluster Operators are running in the cluster by running the following command:
291+
+
292+
[source,terminal]
293+
----
294+
$ oc wait clusteroperators --timeout=30m --all --for=condition=Progressing=false
295+
----

0 commit comments

Comments
 (0)