Skip to content

Commit 1042dd1

Browse files
fhennigrazvan
andauthored
docs: improve install section, improve style, add pod override info (#587)
* ~ * Update rack awareness page * some more wording and formatting * some more wording and formatting * Update docs/modules/hdfs/pages/getting_started/installation.adoc * Update docs/modules/hdfs/pages/usage-guide/operations/graceful-shutdown.adoc * Update docs/modules/hdfs/pages/usage-guide/operations/pod-disruptions.adoc * Update docs/modules/hdfs/pages/usage-guide/resources.adoc --------- Co-authored-by: Razvan-Daniel Mihai <84674+razvan@users.noreply.github.com>
1 parent e9fde01 commit 1042dd1

18 files changed

+112
-98
lines changed

docs/modules/hdfs/pages/getting_started/first_steps.adoc

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
= First steps
22
:description: Deploy and verify an HDFS cluster with Stackable by setting up Zookeeper and HDFS components, then test file operations using WebHDFS API.
33

4-
Once you have followed the steps in the xref:getting_started/installation.adoc[] section to install the operator and its dependencies, you will now deploy an HDFS cluster and its dependencies.
4+
Once you have followed the steps in the xref:getting_started/installation.adoc[] section to install the operator and its dependencies, now deploy an HDFS cluster and its dependencies.
55
Afterward, you can <<_verify_that_it_works, verify that it works>> by creating, verifying and deleting a test file in HDFS.
66

77
== Setup
@@ -13,7 +13,7 @@ To deploy a Zookeeper cluster create one file called `zk.yaml`:
1313
[source,yaml]
1414
include::example$getting_started/zk.yaml[]
1515

16-
We also need to define a ZNode that will be used by the HDFS cluster to reference Zookeeper.
16+
Define a ZNode that is used by the HDFS cluster to reference Zookeeper.
1717
Create another file called `znode.yaml`:
1818

1919
[source,yaml]
@@ -94,7 +94,7 @@ Then use `curl` to issue a `PUT` command:
9494
[source]
9595
include::example$getting_started/getting_started.sh[tag=create-file]
9696

97-
This will return a location that will look something like this:
97+
This returns a location that looks similar to this:
9898

9999
[source]
100100
http://simple-hdfs-datanode-default-0.simple-hdfs-datanode-default.default.svc.cluster.local:9864/webhdfs/v1/testdata.txt?op=CREATE&user.name=stackable&namenoderpcaddress=simple-hdfs&createflag=&createparent=true&overwrite=false
@@ -109,7 +109,7 @@ Rechecking the status again with:
109109
[source]
110110
include::example$getting_started/getting_started.sh[tag=file-status]
111111

112-
will now display some metadata about the file that was created in the HDFS cluster:
112+
now displays some metadata about the file that was created in the HDFS cluster:
113113

114114
[source,json]
115115
{

docs/modules/hdfs/pages/getting_started/index.adoc

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,18 @@
11
= Getting started
22
:description: Start with HDFS using the Stackable Operator. Install the Operator, set up your HDFS cluster, and verify its operation with this guide.
33

4-
This guide will get you started with HDFS using the Stackable Operator.
5-
It will guide you through the installation of the Operator and its dependencies, setting up your first HDFS cluster and verifying its operation.
4+
This guide gets you started with HDFS using the Stackable operator.
5+
It guides you through the installation of the operator and its dependencies, setting up your first HDFS cluster and verifying its operation.
66

77
== Prerequisites
88

9-
You will need:
9+
You need:
1010

1111
* a Kubernetes cluster
1212
* kubectl
1313
* optional: Helm
1414

15-
Resource sizing depends on cluster type(s), usage and scope, but as a starting point we recommend a minimum of the following resources for this operator:
15+
Resource sizing depends on cluster type(s), usage and scope, but as a starting point the following resources are recommended as a minium requirement for this operator:
1616

1717
* 0.2 cores (e.g. i5 or similar)
1818
* 256MB RAM

docs/modules/hdfs/pages/getting_started/installation.adoc

Lines changed: 21 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,39 +1,41 @@
11
= Installation
22
:description: Install the Stackable HDFS operator and dependencies using stackablectl or Helm. Follow steps for setup and verification in Kubernetes.
3+
:kind: https://kind.sigs.k8s.io/
34

4-
On this page you will install the Stackable HDFS operator and its dependency, the Zookeeper operator, as well as the
5+
Install the Stackable HDFS operator and its dependency, the Zookeeper operator, as well as the
56
commons, secret and listener operators which are required by all Stackable operators.
67

7-
== Stackable Operators
8-
9-
There are 2 ways to run Stackable Operators
10-
11-
. Using xref:management:stackablectl:index.adoc[]
12-
. Using Helm
13-
14-
=== stackablectl
8+
There are multiple ways to install the Stackable operators.
9+
xref:management:stackablectl:index.adoc[] is the preferred way, but Helm is also supported.
10+
OpenShift users may prefer installing the operator from the RedHat Certified Operator catalog using the OpenShift web console.
1511

12+
[tabs]
13+
====
14+
stackablectl::
15+
+
16+
--
1617
`stackablectl` is the command line tool to interact with Stackable operators and our recommended way to install
1718
operators. Follow the xref:management:stackablectl:installation.adoc[installation steps] for your platform.
1819
19-
After you have installed `stackablectl`, run the following command to install all operators necessary for the HDFS
20-
cluster:
20+
After you have installed `stackablectl`, run the following command to install all operators necessary for the HDFS cluster:
2121
2222
[source,bash]
2323
----
2424
include::example$getting_started/getting_started.sh[tag=stackablectl-install-operators]
2525
----
2626
27-
The tool will show
27+
The tool prints
2828
2929
[source]
3030
include::example$getting_started/install_output.txt[]
3131
32-
TIP: Consult the xref:management:stackablectl:quickstart.adoc[] to learn more about how to use `stackablectl`. For
33-
example, you can use the `--cluster kind` flag to create a Kubernetes cluster with link:https://kind.sigs.k8s.io/[kind].
34-
35-
=== Helm
32+
TIP: Consult the xref:management:stackablectl:quickstart.adoc[] to learn more about how to use `stackablectl`.
33+
For example, you can use the `--cluster kind` flag to create a Kubernetes cluster with {kind}[kind].
34+
--
3635
36+
Helm::
37+
+
38+
--
3739
You can also use Helm to install the operators. Add the Stackable Helm repository:
3840
[source,bash]
3941
----
@@ -46,8 +48,9 @@ Then install the Stackable Operators:
4648
include::example$getting_started/getting_started.sh[tag=helm-install-operators]
4749
----
4850
49-
Helm will deploy the operators in a Kubernetes Deployment and apply the CRDs for the HDFS cluster (as well as the CRDs
50-
for the required operators). You are now ready to deploy HDFS in Kubernetes.
51+
Helm deploys the operators in a Kubernetes Deployment and apply the CRDs for the HDFS cluster (as well as the CRDs for the required operators).
52+
--
53+
====
5154

5255
== What's next
5356

docs/modules/hdfs/pages/index.adoc

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -18,9 +18,7 @@ The operator depends on the xref:zookeeper:index.adoc[] to operate a ZooKeeper c
1818

1919
== Getting started
2020

21-
Follow the xref:getting_started/index.adoc[Getting started guide] which will guide you through installing the Stackable
22-
HDFS and ZooKeeper operators, setting up ZooKeeper and HDFS and writing a file to HDFS to verify that everything is set
23-
up correctly.
21+
Follow the xref:getting_started/index.adoc[Getting started guide] which guides you through installing the Stackable HDFS and ZooKeeper operators, setting up ZooKeeper and HDFS and writing a file to HDFS to verify that everything is set up correctly.
2422

2523
Afterwards you can consult the xref:usage-guide/index.adoc[] to learn more about tailoring your HDFS configuration to
2624
your needs, or have a look at the <<demos, demos>> for some example setups.

docs/modules/hdfs/pages/reference/commandline-parameters.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ stackable-hdfs-operator run --product-config /foo/bar/properties.yaml
2323

2424
*Multiple values:* false
2525

26-
The operator will **only** watch for resources in the provided namespace `test`:
26+
The operator **only** watches for resources in the provided namespace `test`:
2727

2828
[source]
2929
----

docs/modules/hdfs/pages/reference/environment-variables.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ docker run \
3636

3737
*Multiple values:* false
3838

39-
The operator will **only** watch for resources in the provided namespace `test`:
39+
The operator **only** watches for resources in the provided namespace `test`:
4040

4141
[source]
4242
----

docs/modules/hdfs/pages/usage-guide/configuration-environment-overrides.adoc

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -50,9 +50,10 @@ nameNodes:
5050
replicas: 2
5151
----
5252

53-
All override property values must be strings. The properties will be formatted and escaped correctly into the XML file.
53+
All override property values must be strings.
54+
The properties are formatted and escaped correctly into the XML file.
5455

55-
For a full list of configuration options we refer to the Apache Hdfs documentation for https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml[hdfs-site.xml] and https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/core-default.xml[core-site.xml].
56+
For a full list of configuration options refer to the Apache Hdfs documentation for https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml[hdfs-site.xml] and https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/core-default.xml[core-site.xml].
5657

5758
=== The security.properties file
5859

@@ -117,4 +118,10 @@ nameNodes:
117118
replicas: 1
118119
----
119120

120-
IMPORTANT: Some environment variables will be overriden by the operator and cannot be set manually by the user. These are `HADOOP_HOME`, `HADOOP_CONF_DIR`, `POD_NAME` and `ZOOKEEPER`.
121+
IMPORTANT: Some environment variables are overridden by the operator and cannot be set manually by the user.
122+
These are `HADOOP_HOME`, `HADOOP_CONF_DIR`, `POD_NAME` and `ZOOKEEPER`.
123+
124+
== Pod overrides
125+
126+
The HDFS operator also supports Pod overrides, allowing you to override any property that you can set on a Kubernetes Pod.
127+
Read the xref:concepts:overrides.adoc#pod-overrides[Pod overrides documentation] to learn more about this feature.

docs/modules/hdfs/pages/usage-guide/fuse.adoc

Lines changed: 11 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,13 @@
11
= FUSE
22
:description: Use HDFS FUSE driver to mount HDFS filesystems into Linux environments via a Kubernetes Pod with necessary privileges and configurations.
33

4-
Our images of Apache Hadoop do contain the necessary binaries and libraries to use the HDFS FUSE driver.
5-
6-
FUSE is short for _Filesystem in Userspace_ and allows a user to export a filesystem into the Linux kernel, which can then be mounted.
7-
HDFS contains a native FUSE driver/application, which means that an existing HDFS filesystem can be mounted into a Linux environment.
4+
HDFS contains a native FUSE driver/application, enabling you to mout an existing HDFS filesystem into a Linux environment.
85

6+
Stackable images of Apache Hadoop contain the necessary binaries and libraries to use the HDFS FUSE driver.
97
To use the FUSE driver you can either copy the required files out of the image and run it on a host outside of Kubernetes or you can run it in a Pod.
10-
This Pod, however, will need some extra capabilities.
8+
This Pod, however, needs some extra capabilities.
119

12-
This is an example Pod that will work _as long as the host system that is running the kubelet does support FUSE_:
10+
This is an example Pod that works _as long as the host system that is running the kubelet does support FUSE_:
1311

1412
[source,yaml]
1513
----
@@ -39,8 +37,9 @@ spec:
3937
configMap:
4038
name: <your hdfs here> <2>
4139
----
42-
<1> Ideally use the same version your HDFS is using. FUSE is baked in to our images as of SDP 23.11.
43-
<2> This needs to be a reference to a discovery ConfigMap as written by our HDFS operator.
40+
<1> Ideally use the same version your HDFS is using.
41+
Stackable HDFS images contain the FUSE driver since 23.11.
42+
<2> This needs to be a reference to a discovery ConfigMap as written by the HDFS operator.
4443

4544
[TIP]
4645
.Privileged Pods
@@ -57,7 +56,7 @@ securityContext:
5756
----
5857
5958
Unfortunately, there is no way around some extra privileges.
60-
In Kubernetes the Pods usually share the Kernel with the host running the Kubelet, which means a Pod wanting to use FUSE will need access to the underlying Kernel modules.
59+
In Kubernetes the Pods usually share the Kernel with the host running the Kubelet, which means a Pod wanting to use FUSE needs access to the underlying Kernel modules.
6160
====
6261

6362
Inside this Pod you can get a shell (e.g. using `kubectl exec --stdin --tty hdfs-fuse -- /bin/bash`) to get access to a script called `fuse_dfs_wrapper` (it is in the `PATH` of our Hadoop images).
@@ -70,14 +69,14 @@ To mount HDFS call the script like this:
7069
----
7170
fuse_dfs_wrapper dfs://<your hdfs> <target> <1> <2>
7271
73-
# This will run in debug mode and stay in the foreground
72+
# This runs in debug mode and stays in the foreground
7473
fuse_dfs_wrapper -odebug dfs://<your hdfs> <target>
7574
7675
# Example:
7776
mkdir simple-hdfs
7877
fuse_dfs_wrapper dfs://simple-hdfs simple-hdfs
7978
cd simple-hdfs
80-
# Any operations in this directory will now happen in HDFS
79+
# Any operations in this directory now happens in HDFS
8180
----
8281
<1> Again, use the name of the HDFS service as above
83-
<2> `target` is the directory in which HDFS will be mounted, it must exist otherwise this command will fail
82+
<2> `target` is the directory in which HDFS is mounted, it must exist otherwise this command fails

docs/modules/hdfs/pages/usage-guide/index.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,6 @@
22
:description: Learn to configure and use the Stackable Operator for Apache HDFS. Ensure basic setup knowledge from the Getting Started guide before proceeding.
33
:page-aliases: ROOT:usage.adoc
44

5-
This Section will help you to use and configure the Stackable Operator for Apache HDFS in various ways.
5+
This Section helps you to use and configure the Stackable operator for Apache HDFS in various ways.
66
You should already be familiar with how to set up a basic instance.
77
Follow the xref:getting_started/index.adoc[] guide to learn how to set up a basic instance with all the required dependencies (for example ZooKeeper).

docs/modules/hdfs/pages/usage-guide/listenerclass.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,4 +19,4 @@ spec:
1919
listenerClass: external-stable # <2>
2020
----
2121
<1> DataNode listeners should prioritize having a direct connection, to minimize network transfer overhead.
22-
<2> NameNode listeners should prioritize having a stable address, since they will be baked into the client configuration.
22+
<2> NameNode listeners should prioritize having a stable address, since they are baked into the client configuration.

docs/modules/hdfs/pages/usage-guide/logging-log-aggregation.adoc

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,5 +22,4 @@ spec:
2222
enableVectorAgent: true
2323
----
2424

25-
Further information on how to configure logging, can be found in
26-
xref:concepts:logging.adoc[].
25+
Further information on how to configure logging, can be found in xref:concepts:logging.adoc[].

docs/modules/hdfs/pages/usage-guide/operations/graceful-shutdown.adoc

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,9 @@ You can configure the graceful shutdown as described in xref:concepts:operations
66

77
As a default, JournalNodes have `15 minutes` to shut down gracefully.
88

9-
The JournalNode process will receive a `SIGTERM` signal when Kubernetes wants to terminate the Pod.
10-
It will log the received signal as shown in the log below and initiate a graceful shutdown.
11-
After the graceful shutdown timeout runs out, and the process still didn't exit, Kubernetes will issue a `SIGKILL` signal.
9+
The JournalNode process receives a `SIGTERM` signal when Kubernetes wants to terminate the Pod.
10+
It logs the received signal as shown in the log below and initiate a graceful shutdown.
11+
After the graceful shutdown timeout runs out, and the process is still running, Kubernetes issues a `SIGKILL` signal.
1212

1313
https://github.com/apache/hadoop/blob/a585a73c3e02ac62350c136643a5e7f6095a3dbb/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JournalNode.java#L272[This] is the relevant code that gets executed in the JournalNodes as of HDFS version `3.3.4`.
1414

docs/modules/hdfs/pages/usage-guide/operations/pod-disruptions.adoc

Lines changed: 13 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -3,24 +3,24 @@
33

44
You can configure the permitted Pod disruptions for HDFS nodes as described in xref:concepts:operations/pod_disruptions.adoc[].
55

6-
Unless you configure something else or disable our PodDisruptionBudgets (PDBs), we write the following PDBs:
6+
Unless you configure something else or disable the default PodDisruptionBudgets (PDBs), the operator writes the following PDBs:
77

88
== JournalNodes
9-
We only allow a single JournalNode to be offline at any given time, regardless of the number of replicas or `roleGroups`.
9+
Only a single JournalNode is allowed to be offline at any given time, regardless of the number of replicas or `roleGroups`.
1010

1111
== NameNodes
12-
We only allow a single NameNode to be offline at any given time, regardless of the number of replicas or `roleGroups`.
12+
Only a single NameNode is allowed to be offline at any given time, regardless of the number of replicas or `roleGroups`.
1313

1414
== DataNodes
1515
For DataNodes the question of how many instances can be unavailable at the same time is a bit harder:
1616
HDFS stores your blocks on the DataNodes.
1717
Every block can be replicated multiple times (to multiple DataNodes) to ensure maximum availability.
1818
The default replication factor is `3` - which can be configured using `spec.clusterConfig.dfsReplication`. However, it is also possible to change the replication factor for a specific file or directory to something other than the cluster default.
1919

20-
When you have a replication of `3`, you can safely take down 2 DataNodes, as there will always be a third DataNode holding a copy of each block currently assigned to one of the unavailable DataNodes.
21-
However, you need to be aware that you are now down to a single point of failure - the last of three replicas!
20+
With a replication of `3`, at most 2 data nodes may be down, as the third one is holding a copy of each block currently assigned to the unavailable nodes.
21+
However, the last data node running is now a single point of failure -- the last of three replicas!
2222

23-
Taking this into consideration, our operator uses the following algorithm to determine the maximum number of DataNodes allowed to be unavailable at the same time:
23+
Taking this into consideration, the operator uses the following algorithm to determine the maximum number of DataNodes allowed to be unavailable at the same time:
2424

2525
`num_datanodes` is the number of DataNodes in the HDFS cluster, summed over all `roleGroups`.
2626

@@ -93,13 +93,15 @@ This results e.g. in the following numbers:
9393
|===
9494

9595
== Reduce rolling redeployment durations
96-
The default PDBs we write out are pessimistic and will cause the rolling redeployment to take a considerable amount of time.
97-
As an example, when you have 100 DataNodes and a replication factor of `3`, we can safely only take a single DataNode down at a time. Assuming a DataNode takes 1 minute to properly restart, the whole re-deployment would take 100 minutes.
96+
The default PDBs written out are pessimistic and cause the rolling redeployment to take a considerable amount of time.
97+
As an example, when you have 100 DataNodes and a replication factor of `3`, only a single DataNode can be taken offline at a time.
98+
Assuming a DataNode takes 1 minute to properly restart, the whole re-deployment would take 100 minutes.
9899

99100
You can use the following measures to speed this up:
100101

101-
1. Increase the replication factor, e.g. from `3` to `5`. In this case the number of allowed disruptions triples from `1` to `3` (assuming >= 5 DataNodes), reducing the time it takes by 66%.
102-
2. Increase `maxUnavailable` using the `spec.dataNodes.roleConfig.podDisruptionBudget.maxUnavailable` field as described in xref:concepts:operations/pod_disruptions.adoc[].
103-
3. Write your own PDBs as described in xref:concepts:operations/pod_disruptions.adoc#_using_you_own_custom_pdbs[Using you own custom PDBs].
102+
* Increase the replication factor, e.g. from `3` to `5`.
103+
In this case the number of allowed disruptions triples from `1` to `3` (assuming >= 5 DataNodes), reducing the time it takes by 66%.
104+
* Increase `maxUnavailable` using the `spec.dataNodes.roleConfig.podDisruptionBudget.maxUnavailable` field as described in xref:concepts:operations/pod_disruptions.adoc[].
105+
* Write your own PDBs as described in xref:concepts:operations/pod_disruptions.adoc#_using_you_own_custom_pdbs[Using you own custom PDBs].
104106

105107
WARNING: In cases you modify or disable the default PDBs, it's your responsibility to either make sure there are enough DataNodes available or accept the risk of blocks not being available!

0 commit comments

Comments
 (0)