~

Felix Hennig · Felix Hennig · commit ee010b6eace3 · 2024-09-24T14:15:55.000+02:00
diff --git a/docs/modules/hdfs/pages/getting_started/first_steps.adoc b/docs/modules/hdfs/pages/getting_started/first_steps.adoc
@@ -1,7 +1,7 @@
 = First steps
 :description: Deploy and verify an HDFS cluster with Stackable by setting up Zookeeper and HDFS components, then test file operations using WebHDFS API.
 
-Once you have followed the steps in the xref:getting_started/installation.adoc[] section to install the operator and its dependencies, you will now deploy an HDFS cluster and its dependencies.
+Once you have followed the steps in the xref:getting_started/installation.adoc[] section to install the operator and its dependencies, now deploy an HDFS cluster and its dependencies.
 Afterward, you can <<_verify_that_it_works, verify that it works>> by creating, verifying and deleting a test file in HDFS.
 
 == Setup
@@ -13,7 +13,7 @@ To deploy a Zookeeper cluster create one file called `zk.yaml`:
 [source,yaml]
 include::example$getting_started/zk.yaml[]
 
-We also need to define a ZNode that will be used by the HDFS cluster to reference Zookeeper.
+Define a ZNode that is used by the HDFS cluster to reference Zookeeper.
 Create another file called `znode.yaml`:
 
 [source,yaml]
@@ -94,7 +94,7 @@ Then use `curl` to issue a `PUT` command:
 [source]
 include::example$getting_started/getting_started.sh[tag=create-file]
 
-This will return a location that will look something like this:
+This returns a location that looks similar to this:
 
 [source]
 http://simple-hdfs-datanode-default-0.simple-hdfs-datanode-default.default.svc.cluster.local:9864/webhdfs/v1/testdata.txt?op=CREATE&user.name=stackable&namenoderpcaddress=simple-hdfs&createflag=&createparent=true&overwrite=false
@@ -109,7 +109,7 @@ Rechecking the status again with:
 [source]
 include::example$getting_started/getting_started.sh[tag=file-status]
 
-will now display some metadata about the file that was created in the HDFS cluster:
+now displays some metadata about the file that was created in the HDFS cluster:
 
 [source,json]
 {
diff --git a/docs/modules/hdfs/pages/getting_started/index.adoc b/docs/modules/hdfs/pages/getting_started/index.adoc
@@ -1,18 +1,18 @@
 = Getting started
 :description: Start with HDFS using the Stackable Operator. Install the Operator, set up your HDFS cluster, and verify its operation with this guide.
 
-This guide will get you started with HDFS using the Stackable Operator.
-It will guide you through the installation of the Operator and its dependencies, setting up your first HDFS cluster and verifying its operation.
+This guide gets you started with HDFS using the Stackable operator.
+It guides you through the installation of the operator and its dependencies, setting up your first HDFS cluster and verifying its operation.
 
 == Prerequisites
 
-You will need:
+You need:
 
 * a Kubernetes cluster
 * kubectl
 * optional: Helm
 
-Resource sizing depends on cluster type(s), usage and scope, but as a starting point we recommend a minimum of the following resources for this operator:
+Resource sizing depends on cluster type(s), usage and scope, but as a starting point the following resources are recommended as a minium requirement for this operator:
 
 * 0.2 cores (e.g. i5 or similar)
 * 256MB RAM
diff --git a/docs/modules/hdfs/pages/getting_started/installation.adoc b/docs/modules/hdfs/pages/getting_started/installation.adoc
@@ -1,39 +1,41 @@
 = Installation
 :description: Install the Stackable HDFS operator and dependencies using stackablectl or Helm. Follow steps for setup and verification in Kubernetes.
+:kind: https://kind.sigs.k8s.io/
 
-On this page you will install the Stackable HDFS operator and its dependency, the Zookeeper operator, as well as the
+Install the Stackable HDFS operator and its dependency, the Zookeeper operator, as well as the
 commons, secret and listener operators which are required by all Stackable operators.
 
-== Stackable Operators
-
-There are 2 ways to run Stackable Operators
-
-. Using xref:management:stackablectl:index.adoc[]
-. Using Helm
-
-=== stackablectl
+There are multiple ways to install the Stackable operators.
+xref:management:stackablectl:index.adoc[] is the preferred way but Helm is also supported.
+OpenShift users may prefer installing the operator from the RedHat Certified Operator catalog using the OpenShift web console.
 
+[tabs]
+====
+stackablectl::
++
+--
 `stackablectl` is the command line tool to interact with Stackable operators and our recommended way to install
 operators. Follow the xref:management:stackablectl:installation.adoc[installation steps] for your platform.
 
-After you have installed `stackablectl`, run the following command to install all operators necessary for the HDFS
-cluster:
+After you have installed `stackablectl`, run the following command to install all operators necessary for the HDFS cluster:
 
 [source,bash]
 ----
 include::example$getting_started/getting_started.sh[tag=stackablectl-install-operators]
 ----
 
-The tool will show
+The tool prints
 
 [source]
 include::example$getting_started/install_output.txt[]
 
-TIP: Consult the xref:management:stackablectl:quickstart.adoc[] to learn more about how to use `stackablectl`. For
-example, you can use the `--cluster kind` flag to create a Kubernetes cluster with link:https://kind.sigs.k8s.io/[kind].
-
-=== Helm
+TIP: Consult the xref:management:stackablectl:quickstart.adoc[] to learn more about how to use `stackablectl`.
+For example, you can use the `--cluster kind` flag to create a Kubernetes cluster with {kind}[kind].
+--
 
+Helm::
++
+--
 You can also use Helm to install the operators. Add the Stackable Helm repository:
 [source,bash]
 ----
@@ -46,8 +48,9 @@ Then install the Stackable Operators:
 include::example$getting_started/getting_started.sh[tag=helm-install-operators]
 ----
 
-Helm will deploy the operators in a Kubernetes Deployment and apply the CRDs for the HDFS cluster (as well as the CRDs
-for the required operators). You are now ready to deploy HDFS in Kubernetes.
+Helm deploys the operators in a Kubernetes Deployment and apply the CRDs for the HDFS cluster (as well as the CRDs for the required operators).
+--
+====
 
 == What's next
 
diff --git a/docs/modules/hdfs/pages/index.adoc b/docs/modules/hdfs/pages/index.adoc
@@ -18,9 +18,7 @@ The operator depends on the xref:zookeeper:index.adoc[] to operate a ZooKeeper c
 
 == Getting started
 
-Follow the xref:getting_started/index.adoc[Getting started guide] which will guide you through installing the Stackable
-HDFS and ZooKeeper operators, setting up ZooKeeper and HDFS and writing a file to HDFS to verify that everything is set
-up correctly.
+Follow the xref:getting_started/index.adoc[Getting started guide] which guides you through installing the Stackable HDFS and ZooKeeper operators, setting up ZooKeeper and HDFS and writing a file to HDFS to verify that everything is set up correctly.
 
 Afterwards you can consult the xref:usage-guide/index.adoc[] to learn more about tailoring your HDFS configuration to
 your needs, or have a look at the <<demos, demos>> for some example setups.
diff --git a/docs/modules/hdfs/pages/reference/commandline-parameters.adoc b/docs/modules/hdfs/pages/reference/commandline-parameters.adoc
@@ -23,7 +23,7 @@ stackable-hdfs-operator run --product-config /foo/bar/properties.yaml
 
 *Multiple values:* false
 
-The operator will **only** watch for resources in the provided namespace `test`:
+The operator **only** watches for resources in the provided namespace `test`:
 
 [source]
 ----
diff --git a/docs/modules/hdfs/pages/reference/environment-variables.adoc b/docs/modules/hdfs/pages/reference/environment-variables.adoc
@@ -36,7 +36,7 @@ docker run \
 
 *Multiple values:* false
 
-The operator will **only** watch for resources in the provided namespace `test`:
+The operator **only** watches for resources in the provided namespace `test`:
 
 [source]
 ----
diff --git a/docs/modules/hdfs/pages/usage-guide/configuration-environment-overrides.adoc b/docs/modules/hdfs/pages/usage-guide/configuration-environment-overrides.adoc
@@ -50,7 +50,8 @@ nameNodes:
       replicas: 2
 ----
 
-All override property values must be strings. The properties will be formatted and escaped correctly into the XML file.
+All override property values must be strings.
+The properties are formatted and escaped correctly into the XML file.
 
 For a full list of configuration options we refer to the Apache Hdfs documentation for https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml[hdfs-site.xml] and https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/core-default.xml[core-site.xml].
 
@@ -117,4 +118,10 @@ nameNodes:
       replicas: 1
 ----
 
-IMPORTANT: Some environment variables will be overriden by the operator and cannot be set manually by the user. These are `HADOOP_HOME`, `HADOOP_CONF_DIR`, `POD_NAME` and `ZOOKEEPER`.
+IMPORTANT: Some environment variables are overridden by the operator and cannot be set manually by the user.
+These are `HADOOP_HOME`, `HADOOP_CONF_DIR`, `POD_NAME` and `ZOOKEEPER`.
+
+== Pod overrides
+
+The HDFS operator also supports Pod overrides, allowing you to override any property that you can set on a Kubernetes Pod.
+Read the xref:concepts:overrides.adoc#pod-overrides[Pod overrides documentation] to learn more about this feature.
diff --git a/docs/modules/hdfs/pages/usage-guide/fuse.adoc b/docs/modules/hdfs/pages/usage-guide/fuse.adoc
@@ -7,9 +7,9 @@ FUSE is short for _Filesystem in Userspace_ and allows a user to export a filesy
 HDFS contains a native FUSE driver/application, which means that an existing HDFS filesystem can be mounted into a Linux environment.
 
 To use the FUSE driver you can either copy the required files out of the image and run it on a host outside of Kubernetes or you can run it in a Pod.
-This Pod, however, will need some extra capabilities.
+This Pod, however, needs some extra capabilities.
 
-This is an example Pod that will work _as long as the host system that is running the kubelet does support FUSE_:
+This is an example Pod that works _as long as the host system that is running the kubelet does support FUSE_:
 
 [source,yaml]
 ----
@@ -57,7 +57,7 @@ securityContext:
 ----
 
 Unfortunately, there is no way around some extra privileges.
-In Kubernetes the Pods usually share the Kernel with the host running the Kubelet, which means a Pod wanting to use FUSE will need access to the underlying Kernel modules.
+In Kubernetes the Pods usually share the Kernel with the host running the Kubelet, which means a Pod wanting to use FUSE needs access to the underlying Kernel modules.
 ====
 
 Inside this Pod you can get a shell (e.g. using `kubectl exec --stdin --tty hdfs-fuse -- /bin/bash`) to get access to a script called `fuse_dfs_wrapper` (it is in the `PATH` of our Hadoop images).
@@ -70,14 +70,14 @@ To mount HDFS call the script like this:
 ----
 fuse_dfs_wrapper dfs://<your hdfs> <target> <1> <2>
 
-# This will run in debug mode and stay in the foreground
+# This runs in debug mode and stays in the foreground
 fuse_dfs_wrapper -odebug dfs://<your hdfs> <target>
 
 # Example:
 mkdir simple-hdfs
 fuse_dfs_wrapper dfs://simple-hdfs simple-hdfs
 cd simple-hdfs
-# Any operations in this directory will now happen in HDFS
+# Any operations in this directory now happens in HDFS
 ----
 <1> Again, use the name of the HDFS service as above
-<2> `target` is the directory in which HDFS will be mounted, it must exist otherwise this command will fail
+<2> `target` is the directory in which HDFS is mounted, it must exist otherwise this command fails
diff --git a/docs/modules/hdfs/pages/usage-guide/index.adoc b/docs/modules/hdfs/pages/usage-guide/index.adoc
@@ -2,6 +2,6 @@
 :description: Learn to configure and use the Stackable Operator for Apache HDFS. Ensure basic setup knowledge from the Getting Started guide before proceeding.
 :page-aliases: ROOT:usage.adoc
 
-This Section will help you to use and configure the Stackable Operator for Apache HDFS in various ways.
+This Section helps you to use and configure the Stackable operator for Apache HDFS in various ways.
 You should already be familiar with how to set up a basic instance.
 Follow the xref:getting_started/index.adoc[] guide to learn how to set up a basic instance with all the required dependencies (for example ZooKeeper).
diff --git a/docs/modules/hdfs/pages/usage-guide/listenerclass.adoc b/docs/modules/hdfs/pages/usage-guide/listenerclass.adoc
@@ -19,4 +19,4 @@ spec:
       listenerClass: external-stable # <2>
 ----
 <1> DataNode listeners should prioritize having a direct connection, to minimize network transfer overhead.
-<2> NameNode listeners should prioritize having a stable address, since they will be baked into the client configuration.
+<2> NameNode listeners should prioritize having a stable address, since they are baked into the client configuration.
diff --git a/docs/modules/hdfs/pages/usage-guide/operations/graceful-shutdown.adoc b/docs/modules/hdfs/pages/usage-guide/operations/graceful-shutdown.adoc
@@ -6,9 +6,9 @@ You can configure the graceful shutdown as described in xref:concepts:operations
 
 As a default, JournalNodes have `15 minutes` to shut down gracefully.
 
-The JournalNode process will receive a `SIGTERM` signal when Kubernetes wants to terminate the Pod.
-It will log the received signal as shown in the log below and initiate a graceful shutdown.
-After the graceful shutdown timeout runs out, and the process still didn't exit, Kubernetes will issue a `SIGKILL` signal.
+The JournalNode process receives a `SIGTERM` signal when Kubernetes wants to terminate the Pod.
+It logs the received signal as shown in the log below and initiate a graceful shutdown.
+After the graceful shutdown timeout runs out, and the process still didn't exit, Kubernetes issues a `SIGKILL` signal.
 
 https://github.com/apache/hadoop/blob/a585a73c3e02ac62350c136643a5e7f6095a3dbb/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JournalNode.java#L272[This] is the relevant code that gets executed in the JournalNodes as of HDFS version `3.3.4`.
 
diff --git a/docs/modules/hdfs/pages/usage-guide/operations/pod-disruptions.adoc b/docs/modules/hdfs/pages/usage-guide/operations/pod-disruptions.adoc
@@ -3,22 +3,22 @@
 
 You can configure the permitted Pod disruptions for HDFS nodes as described in xref:concepts:operations/pod_disruptions.adoc[].
 
-Unless you configure something else or disable our PodDisruptionBudgets (PDBs), we write the following PDBs:
+Unless you configure something else or disable our PodDisruptionBudgets (PDBs), the operator write the following PDBs:
 
 == JournalNodes
-We only allow a single JournalNode to be offline at any given time, regardless of the number of replicas or `roleGroups`.
+Only a single JournalNode is allowed to be offline at any given time, regardless of the number of replicas or `roleGroups`.
 
 == NameNodes
-We only allow a single NameNode to be offline at any given time, regardless of the number of replicas or `roleGroups`.
+Only a single NameNode is allowed to be offline at any given time, regardless of the number of replicas or `roleGroups`.
 
 == DataNodes
 For DataNodes the question of how many instances can be unavailable at the same time is a bit harder:
 HDFS stores your blocks on the DataNodes.
 Every block can be replicated multiple times (to multiple DataNodes) to ensure maximum availability.
 The default replication factor is `3` - which can be configured using `spec.clusterConfig.dfsReplication`. However, it is also possible to change the replication factor for a specific file or directory to something other than the cluster default.
 
-When you have a replication of `3`, you can safely take down 2 DataNodes, as there will always be a third DataNode holding a copy of each block currently assigned to one of the unavailable DataNodes.
-However, you need to be aware that you are now down to a single point of failure - the last of three replicas!
+When you have a replication of `3`, you can safely take down 2 DataNodes, as there is always a third DataNode holding a copy of each block currently assigned to one of the unavailable DataNodes.
+However, you need to be aware that you are now down to a single point of failure -- the last of three replicas!
 
 Taking this into consideration, our operator uses the following algorithm to determine the maximum number of DataNodes allowed to be unavailable at the same time:
 
@@ -93,13 +93,15 @@ This results e.g. in the following numbers:
 |===
 
 == Reduce rolling redeployment durations
-The default PDBs we write out are pessimistic and will cause the rolling redeployment to take a considerable amount of time.
-As an example, when you have 100 DataNodes and a replication factor of `3`, we can safely only take a single DataNode down at a time. Assuming a DataNode takes 1 minute to properly restart, the whole re-deployment would take 100 minutes.
+The default PDBs written out are pessimistic and cause the rolling redeployment to take a considerable amount of time.
+As an example, when you have 100 DataNodes and a replication factor of `3`, only a single DataNode can be taken offline at a time.
+Assuming a DataNode takes 1 minute to properly restart, the whole re-deployment would take 100 minutes.
 
 You can use the following measures to speed this up:
 
-1. Increase the replication factor, e.g. from `3` to `5`. In this case the number of allowed disruptions triples from `1` to `3` (assuming >= 5 DataNodes), reducing the time it takes by 66%.
-2. Increase `maxUnavailable` using the `spec.dataNodes.roleConfig.podDisruptionBudget.maxUnavailable` field as described in xref:concepts:operations/pod_disruptions.adoc[].
-3. Write your own PDBs as described in xref:concepts:operations/pod_disruptions.adoc#_using_you_own_custom_pdbs[Using you own custom PDBs].
+* Increase the replication factor, e.g. from `3` to `5`.
+  In this case the number of allowed disruptions triples from `1` to `3` (assuming >= 5 DataNodes), reducing the time it takes by 66%.
+* Increase `maxUnavailable` using the `spec.dataNodes.roleConfig.podDisruptionBudget.maxUnavailable` field as described in xref:concepts:operations/pod_disruptions.adoc[].
+* Write your own PDBs as described in xref:concepts:operations/pod_disruptions.adoc#_using_you_own_custom_pdbs[Using you own custom PDBs].
 
 WARNING: In cases you modify or disable the default PDBs, it's your responsibility to either make sure there are enough DataNodes available or accept the risk of blocks not being available!
diff --git a/docs/modules/hdfs/pages/usage-guide/operations/rack-awareness.adoc b/docs/modules/hdfs/pages/usage-guide/operations/rack-awareness.adoc
@@ -1,7 +1,7 @@
 = HDFS Rack Awareness
 
 Apache Hadoop supports a feature called Rack Awareness, which allows users to define a topology for the nodes making up a cluster.
-Hadoop will then use that topology to spread out replicas of blocks in a fashion that maximizes fault tolerance.
+Hadoop then uses that topology to spread out replicas of blocks in a fashion that maximizes fault tolerance.
 
 The default write path, for example, is to put replicas of a newly created block first on a different node, but within the same rack, and the second copy on a node in a remote rack.
 In order for this to work properly, Hadoop needs to have access to the information about the underlying infrastructure it runs on. In a Kubernetes environment, this means obtaining information from the pods or nodes of the cluster.
@@ -29,4 +29,4 @@ spec:
     ...
 ----
 
-Internally this will be used to create a topology label consisting of the value of the node label `topology.kubernetes.io/zone` and the pod label `app.kubernetes.io/role-group`, e.g. `/eu-central-1/rg1`.
+Internally this is used to create a topology label consisting of the value of the node label `topology.kubernetes.io/zone` and the pod label `app.kubernetes.io/role-group`, e.g. `/eu-central-1/rg1`.
diff --git a/docs/modules/hdfs/pages/usage-guide/resources.adoc b/docs/modules/hdfs/pages/usage-guide/resources.adoc
@@ -5,7 +5,7 @@
 
 You can mount volumes where data is stored by specifying https://kubernetes.io/docs/concepts/storage/persistent-volumes[PersistentVolumeClaims] for each individual role group.
 
-In case nothing is configured in the custom resource for a certain role group, each Pod will have one volume mount with `10Gi` capacity and storage type `Disk`:
+In case nothing is configured in the custom resource for a certain role group, each Pod has one volume mount with `10Gi` capacity and storage type `Disk`:
 
 [source,yaml]
 ----
@@ -35,7 +35,7 @@ dataNodes:
               capacity: 128Gi
 ----
 
-In the above example, all DataNodes in the default group will store data (the location of `dfs.datanode.name.dir`) on a `128Gi` volume.
+In the above example, all DataNodes in the default group store data (the location of `dfs.datanode.name.dir`) on a `128Gi` volume.
 
 === Multiple storage volumes
 
@@ -61,13 +61,13 @@ dataNodes:
               capacity: 5Ti
               storageClass: premium-ssd
               hdfsStorageType: SSD
-            # The default "data" PVC will still be created.
+            # The default "data" PVC is still created.
             # If this is not desired then the count must be set to 0.
             data:
               count: 0
 ----
 
-This will create the following PVCs:
+This creates the following PVCs:
 
 1. `my-disks-hdfs-datanode-default-0` (12Ti)
 2. `my-disks-1-hdfs-datanode-default-0` (12Ti)
@@ -81,7 +81,7 @@ By configuring and using a dedicated https://kubernetes.io/docs/concepts/storage
 ====
 You might need to re-create the StatefulSet to apply the new PVC configuration because of https://github.com/kubernetes/kubernetes/issues/68737[this Kubernetes issue].
 You can delete the StatefulSet using `kubectl delete statefulsets --cascade=orphan <statefulset>`.
-The hdfs-operator will re-create the StatefulSet automatically.
+The hdfs-operator recreates the StatefulSet automatically.
 ====
 
 == Resource Requests
diff --git a/docs/modules/hdfs/pages/usage-guide/security.adoc b/docs/modules/hdfs/pages/usage-guide/security.adoc
diff --git a/docs/modules/hdfs/pages/usage-guide/upgrading.adoc b/docs/modules/hdfs/pages/usage-guide/upgrading.adoc