From ee010b6eace3428c8254c962209f6e921ed5398f Mon Sep 17 00:00:00 2001 From: Felix Hennig Date: Tue, 24 Sep 2024 14:15:55 +0200 Subject: [PATCH 1/8] ~ --- .../pages/getting_started/first_steps.adoc | 8 ++-- .../hdfs/pages/getting_started/index.adoc | 8 ++-- .../pages/getting_started/installation.adoc | 39 ++++++++++--------- docs/modules/hdfs/pages/index.adoc | 4 +- .../reference/commandline-parameters.adoc | 2 +- .../reference/environment-variables.adoc | 2 +- .../configuration-environment-overrides.adoc | 11 +++++- docs/modules/hdfs/pages/usage-guide/fuse.adoc | 12 +++--- .../modules/hdfs/pages/usage-guide/index.adoc | 2 +- .../hdfs/pages/usage-guide/listenerclass.adoc | 2 +- .../operations/graceful-shutdown.adoc | 6 +-- .../operations/pod-disruptions.adoc | 22 ++++++----- .../operations/rack-awareness.adoc | 4 +- .../hdfs/pages/usage-guide/resources.adoc | 10 ++--- .../hdfs/pages/usage-guide/security.adoc | 18 ++++----- .../hdfs/pages/usage-guide/upgrading.adoc | 15 +++---- 16 files changed, 88 insertions(+), 77 deletions(-) diff --git a/docs/modules/hdfs/pages/getting_started/first_steps.adoc b/docs/modules/hdfs/pages/getting_started/first_steps.adoc index be74bdf1..529c93ca 100644 --- a/docs/modules/hdfs/pages/getting_started/first_steps.adoc +++ b/docs/modules/hdfs/pages/getting_started/first_steps.adoc @@ -1,7 +1,7 @@ = First steps :description: Deploy and verify an HDFS cluster with Stackable by setting up Zookeeper and HDFS components, then test file operations using WebHDFS API. -Once you have followed the steps in the xref:getting_started/installation.adoc[] section to install the operator and its dependencies, you will now deploy an HDFS cluster and its dependencies. +Once you have followed the steps in the xref:getting_started/installation.adoc[] section to install the operator and its dependencies, now deploy an HDFS cluster and its dependencies. Afterward, you can <<_verify_that_it_works, verify that it works>> by creating, verifying and deleting a test file in HDFS. == Setup @@ -13,7 +13,7 @@ To deploy a Zookeeper cluster create one file called `zk.yaml`: [source,yaml] include::example$getting_started/zk.yaml[] -We also need to define a ZNode that will be used by the HDFS cluster to reference Zookeeper. +Define a ZNode that is used by the HDFS cluster to reference Zookeeper. Create another file called `znode.yaml`: [source,yaml] @@ -94,7 +94,7 @@ Then use `curl` to issue a `PUT` command: [source] include::example$getting_started/getting_started.sh[tag=create-file] -This will return a location that will look something like this: +This returns a location that looks similar to this: [source] http://simple-hdfs-datanode-default-0.simple-hdfs-datanode-default.default.svc.cluster.local:9864/webhdfs/v1/testdata.txt?op=CREATE&user.name=stackable&namenoderpcaddress=simple-hdfs&createflag=&createparent=true&overwrite=false @@ -109,7 +109,7 @@ Rechecking the status again with: [source] include::example$getting_started/getting_started.sh[tag=file-status] -will now display some metadata about the file that was created in the HDFS cluster: +now displays some metadata about the file that was created in the HDFS cluster: [source,json] { diff --git a/docs/modules/hdfs/pages/getting_started/index.adoc b/docs/modules/hdfs/pages/getting_started/index.adoc index 536ea247..d591ab9f 100644 --- a/docs/modules/hdfs/pages/getting_started/index.adoc +++ b/docs/modules/hdfs/pages/getting_started/index.adoc @@ -1,18 +1,18 @@ = Getting started :description: Start with HDFS using the Stackable Operator. Install the Operator, set up your HDFS cluster, and verify its operation with this guide. -This guide will get you started with HDFS using the Stackable Operator. -It will guide you through the installation of the Operator and its dependencies, setting up your first HDFS cluster and verifying its operation. +This guide gets you started with HDFS using the Stackable operator. +It guides you through the installation of the operator and its dependencies, setting up your first HDFS cluster and verifying its operation. == Prerequisites -You will need: +You need: * a Kubernetes cluster * kubectl * optional: Helm -Resource sizing depends on cluster type(s), usage and scope, but as a starting point we recommend a minimum of the following resources for this operator: +Resource sizing depends on cluster type(s), usage and scope, but as a starting point the following resources are recommended as a minium requirement for this operator: * 0.2 cores (e.g. i5 or similar) * 256MB RAM diff --git a/docs/modules/hdfs/pages/getting_started/installation.adoc b/docs/modules/hdfs/pages/getting_started/installation.adoc index 0f3ce6cb..d62421f2 100644 --- a/docs/modules/hdfs/pages/getting_started/installation.adoc +++ b/docs/modules/hdfs/pages/getting_started/installation.adoc @@ -1,39 +1,41 @@ = Installation :description: Install the Stackable HDFS operator and dependencies using stackablectl or Helm. Follow steps for setup and verification in Kubernetes. +:kind: https://kind.sigs.k8s.io/ -On this page you will install the Stackable HDFS operator and its dependency, the Zookeeper operator, as well as the +Install the Stackable HDFS operator and its dependency, the Zookeeper operator, as well as the commons, secret and listener operators which are required by all Stackable operators. -== Stackable Operators - -There are 2 ways to run Stackable Operators - -. Using xref:management:stackablectl:index.adoc[] -. Using Helm - -=== stackablectl +There are multiple ways to install the Stackable operators. +xref:management:stackablectl:index.adoc[] is the preferred way but Helm is also supported. +OpenShift users may prefer installing the operator from the RedHat Certified Operator catalog using the OpenShift web console. +[tabs] +==== +stackablectl:: ++ +-- `stackablectl` is the command line tool to interact with Stackable operators and our recommended way to install operators. Follow the xref:management:stackablectl:installation.adoc[installation steps] for your platform. -After you have installed `stackablectl`, run the following command to install all operators necessary for the HDFS -cluster: +After you have installed `stackablectl`, run the following command to install all operators necessary for the HDFS cluster: [source,bash] ---- include::example$getting_started/getting_started.sh[tag=stackablectl-install-operators] ---- -The tool will show +The tool prints [source] include::example$getting_started/install_output.txt[] -TIP: Consult the xref:management:stackablectl:quickstart.adoc[] to learn more about how to use `stackablectl`. For -example, you can use the `--cluster kind` flag to create a Kubernetes cluster with link:https://kind.sigs.k8s.io/[kind]. - -=== Helm +TIP: Consult the xref:management:stackablectl:quickstart.adoc[] to learn more about how to use `stackablectl`. +For example, you can use the `--cluster kind` flag to create a Kubernetes cluster with {kind}[kind]. +-- +Helm:: ++ +-- You can also use Helm to install the operators. Add the Stackable Helm repository: [source,bash] ---- @@ -46,8 +48,9 @@ Then install the Stackable Operators: include::example$getting_started/getting_started.sh[tag=helm-install-operators] ---- -Helm will deploy the operators in a Kubernetes Deployment and apply the CRDs for the HDFS cluster (as well as the CRDs -for the required operators). You are now ready to deploy HDFS in Kubernetes. +Helm deploys the operators in a Kubernetes Deployment and apply the CRDs for the HDFS cluster (as well as the CRDs for the required operators). +-- +==== == What's next diff --git a/docs/modules/hdfs/pages/index.adoc b/docs/modules/hdfs/pages/index.adoc index be2a4b9d..4a43420d 100644 --- a/docs/modules/hdfs/pages/index.adoc +++ b/docs/modules/hdfs/pages/index.adoc @@ -18,9 +18,7 @@ The operator depends on the xref:zookeeper:index.adoc[] to operate a ZooKeeper c == Getting started -Follow the xref:getting_started/index.adoc[Getting started guide] which will guide you through installing the Stackable -HDFS and ZooKeeper operators, setting up ZooKeeper and HDFS and writing a file to HDFS to verify that everything is set -up correctly. +Follow the xref:getting_started/index.adoc[Getting started guide] which guides you through installing the Stackable HDFS and ZooKeeper operators, setting up ZooKeeper and HDFS and writing a file to HDFS to verify that everything is set up correctly. Afterwards you can consult the xref:usage-guide/index.adoc[] to learn more about tailoring your HDFS configuration to your needs, or have a look at the <> for some example setups. diff --git a/docs/modules/hdfs/pages/reference/commandline-parameters.adoc b/docs/modules/hdfs/pages/reference/commandline-parameters.adoc index 01d95016..c3f62f6d 100644 --- a/docs/modules/hdfs/pages/reference/commandline-parameters.adoc +++ b/docs/modules/hdfs/pages/reference/commandline-parameters.adoc @@ -23,7 +23,7 @@ stackable-hdfs-operator run --product-config /foo/bar/properties.yaml *Multiple values:* false -The operator will **only** watch for resources in the provided namespace `test`: +The operator **only** watches for resources in the provided namespace `test`: [source] ---- diff --git a/docs/modules/hdfs/pages/reference/environment-variables.adoc b/docs/modules/hdfs/pages/reference/environment-variables.adoc index c2ce078b..ad333d8a 100644 --- a/docs/modules/hdfs/pages/reference/environment-variables.adoc +++ b/docs/modules/hdfs/pages/reference/environment-variables.adoc @@ -36,7 +36,7 @@ docker run \ *Multiple values:* false -The operator will **only** watch for resources in the provided namespace `test`: +The operator **only** watches for resources in the provided namespace `test`: [source] ---- diff --git a/docs/modules/hdfs/pages/usage-guide/configuration-environment-overrides.adoc b/docs/modules/hdfs/pages/usage-guide/configuration-environment-overrides.adoc index 1aa542f6..e3d8fca6 100644 --- a/docs/modules/hdfs/pages/usage-guide/configuration-environment-overrides.adoc +++ b/docs/modules/hdfs/pages/usage-guide/configuration-environment-overrides.adoc @@ -50,7 +50,8 @@ nameNodes: replicas: 2 ---- -All override property values must be strings. The properties will be formatted and escaped correctly into the XML file. +All override property values must be strings. +The properties are formatted and escaped correctly into the XML file. For a full list of configuration options we refer to the Apache Hdfs documentation for https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml[hdfs-site.xml] and https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/core-default.xml[core-site.xml]. @@ -117,4 +118,10 @@ nameNodes: replicas: 1 ---- -IMPORTANT: Some environment variables will be overriden by the operator and cannot be set manually by the user. These are `HADOOP_HOME`, `HADOOP_CONF_DIR`, `POD_NAME` and `ZOOKEEPER`. +IMPORTANT: Some environment variables are overridden by the operator and cannot be set manually by the user. +These are `HADOOP_HOME`, `HADOOP_CONF_DIR`, `POD_NAME` and `ZOOKEEPER`. + +== Pod overrides + +The HDFS operator also supports Pod overrides, allowing you to override any property that you can set on a Kubernetes Pod. +Read the xref:concepts:overrides.adoc#pod-overrides[Pod overrides documentation] to learn more about this feature. diff --git a/docs/modules/hdfs/pages/usage-guide/fuse.adoc b/docs/modules/hdfs/pages/usage-guide/fuse.adoc index b6e4186e..3cc426d4 100644 --- a/docs/modules/hdfs/pages/usage-guide/fuse.adoc +++ b/docs/modules/hdfs/pages/usage-guide/fuse.adoc @@ -7,9 +7,9 @@ FUSE is short for _Filesystem in Userspace_ and allows a user to export a filesy HDFS contains a native FUSE driver/application, which means that an existing HDFS filesystem can be mounted into a Linux environment. To use the FUSE driver you can either copy the required files out of the image and run it on a host outside of Kubernetes or you can run it in a Pod. -This Pod, however, will need some extra capabilities. +This Pod, however, needs some extra capabilities. -This is an example Pod that will work _as long as the host system that is running the kubelet does support FUSE_: +This is an example Pod that works _as long as the host system that is running the kubelet does support FUSE_: [source,yaml] ---- @@ -57,7 +57,7 @@ securityContext: ---- Unfortunately, there is no way around some extra privileges. -In Kubernetes the Pods usually share the Kernel with the host running the Kubelet, which means a Pod wanting to use FUSE will need access to the underlying Kernel modules. +In Kubernetes the Pods usually share the Kernel with the host running the Kubelet, which means a Pod wanting to use FUSE needs access to the underlying Kernel modules. ==== Inside this Pod you can get a shell (e.g. using `kubectl exec --stdin --tty hdfs-fuse -- /bin/bash`) to get access to a script called `fuse_dfs_wrapper` (it is in the `PATH` of our Hadoop images). @@ -70,14 +70,14 @@ To mount HDFS call the script like this: ---- fuse_dfs_wrapper dfs:// <1> <2> -# This will run in debug mode and stay in the foreground +# This runs in debug mode and stays in the foreground fuse_dfs_wrapper -odebug dfs:// # Example: mkdir simple-hdfs fuse_dfs_wrapper dfs://simple-hdfs simple-hdfs cd simple-hdfs -# Any operations in this directory will now happen in HDFS +# Any operations in this directory now happens in HDFS ---- <1> Again, use the name of the HDFS service as above -<2> `target` is the directory in which HDFS will be mounted, it must exist otherwise this command will fail +<2> `target` is the directory in which HDFS is mounted, it must exist otherwise this command fails diff --git a/docs/modules/hdfs/pages/usage-guide/index.adoc b/docs/modules/hdfs/pages/usage-guide/index.adoc index f02554c7..f0d615a9 100644 --- a/docs/modules/hdfs/pages/usage-guide/index.adoc +++ b/docs/modules/hdfs/pages/usage-guide/index.adoc @@ -2,6 +2,6 @@ :description: Learn to configure and use the Stackable Operator for Apache HDFS. Ensure basic setup knowledge from the Getting Started guide before proceeding. :page-aliases: ROOT:usage.adoc -This Section will help you to use and configure the Stackable Operator for Apache HDFS in various ways. +This Section helps you to use and configure the Stackable operator for Apache HDFS in various ways. You should already be familiar with how to set up a basic instance. Follow the xref:getting_started/index.adoc[] guide to learn how to set up a basic instance with all the required dependencies (for example ZooKeeper). diff --git a/docs/modules/hdfs/pages/usage-guide/listenerclass.adoc b/docs/modules/hdfs/pages/usage-guide/listenerclass.adoc index 11da505c..b618abee 100644 --- a/docs/modules/hdfs/pages/usage-guide/listenerclass.adoc +++ b/docs/modules/hdfs/pages/usage-guide/listenerclass.adoc @@ -19,4 +19,4 @@ spec: listenerClass: external-stable # <2> ---- <1> DataNode listeners should prioritize having a direct connection, to minimize network transfer overhead. -<2> NameNode listeners should prioritize having a stable address, since they will be baked into the client configuration. +<2> NameNode listeners should prioritize having a stable address, since they are baked into the client configuration. diff --git a/docs/modules/hdfs/pages/usage-guide/operations/graceful-shutdown.adoc b/docs/modules/hdfs/pages/usage-guide/operations/graceful-shutdown.adoc index f29f3ee4..3f933b31 100644 --- a/docs/modules/hdfs/pages/usage-guide/operations/graceful-shutdown.adoc +++ b/docs/modules/hdfs/pages/usage-guide/operations/graceful-shutdown.adoc @@ -6,9 +6,9 @@ You can configure the graceful shutdown as described in xref:concepts:operations As a default, JournalNodes have `15 minutes` to shut down gracefully. -The JournalNode process will receive a `SIGTERM` signal when Kubernetes wants to terminate the Pod. -It will log the received signal as shown in the log below and initiate a graceful shutdown. -After the graceful shutdown timeout runs out, and the process still didn't exit, Kubernetes will issue a `SIGKILL` signal. +The JournalNode process receives a `SIGTERM` signal when Kubernetes wants to terminate the Pod. +It logs the received signal as shown in the log below and initiate a graceful shutdown. +After the graceful shutdown timeout runs out, and the process still didn't exit, Kubernetes issues a `SIGKILL` signal. https://github.com/apache/hadoop/blob/a585a73c3e02ac62350c136643a5e7f6095a3dbb/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JournalNode.java#L272[This] is the relevant code that gets executed in the JournalNodes as of HDFS version `3.3.4`. diff --git a/docs/modules/hdfs/pages/usage-guide/operations/pod-disruptions.adoc b/docs/modules/hdfs/pages/usage-guide/operations/pod-disruptions.adoc index 22925932..36e1859a 100644 --- a/docs/modules/hdfs/pages/usage-guide/operations/pod-disruptions.adoc +++ b/docs/modules/hdfs/pages/usage-guide/operations/pod-disruptions.adoc @@ -3,13 +3,13 @@ You can configure the permitted Pod disruptions for HDFS nodes as described in xref:concepts:operations/pod_disruptions.adoc[]. -Unless you configure something else or disable our PodDisruptionBudgets (PDBs), we write the following PDBs: +Unless you configure something else or disable our PodDisruptionBudgets (PDBs), the operator write the following PDBs: == JournalNodes -We only allow a single JournalNode to be offline at any given time, regardless of the number of replicas or `roleGroups`. +Only a single JournalNode is allowed to be offline at any given time, regardless of the number of replicas or `roleGroups`. == NameNodes -We only allow a single NameNode to be offline at any given time, regardless of the number of replicas or `roleGroups`. +Only a single NameNode is allowed to be offline at any given time, regardless of the number of replicas or `roleGroups`. == DataNodes For DataNodes the question of how many instances can be unavailable at the same time is a bit harder: @@ -17,8 +17,8 @@ HDFS stores your blocks on the DataNodes. Every block can be replicated multiple times (to multiple DataNodes) to ensure maximum availability. The default replication factor is `3` - which can be configured using `spec.clusterConfig.dfsReplication`. However, it is also possible to change the replication factor for a specific file or directory to something other than the cluster default. -When you have a replication of `3`, you can safely take down 2 DataNodes, as there will always be a third DataNode holding a copy of each block currently assigned to one of the unavailable DataNodes. -However, you need to be aware that you are now down to a single point of failure - the last of three replicas! +When you have a replication of `3`, you can safely take down 2 DataNodes, as there is always a third DataNode holding a copy of each block currently assigned to one of the unavailable DataNodes. +However, you need to be aware that you are now down to a single point of failure -- the last of three replicas! Taking this into consideration, our operator uses the following algorithm to determine the maximum number of DataNodes allowed to be unavailable at the same time: @@ -93,13 +93,15 @@ This results e.g. in the following numbers: |=== == Reduce rolling redeployment durations -The default PDBs we write out are pessimistic and will cause the rolling redeployment to take a considerable amount of time. -As an example, when you have 100 DataNodes and a replication factor of `3`, we can safely only take a single DataNode down at a time. Assuming a DataNode takes 1 minute to properly restart, the whole re-deployment would take 100 minutes. +The default PDBs written out are pessimistic and cause the rolling redeployment to take a considerable amount of time. +As an example, when you have 100 DataNodes and a replication factor of `3`, only a single DataNode can be taken offline at a time. +Assuming a DataNode takes 1 minute to properly restart, the whole re-deployment would take 100 minutes. You can use the following measures to speed this up: -1. Increase the replication factor, e.g. from `3` to `5`. In this case the number of allowed disruptions triples from `1` to `3` (assuming >= 5 DataNodes), reducing the time it takes by 66%. -2. Increase `maxUnavailable` using the `spec.dataNodes.roleConfig.podDisruptionBudget.maxUnavailable` field as described in xref:concepts:operations/pod_disruptions.adoc[]. -3. Write your own PDBs as described in xref:concepts:operations/pod_disruptions.adoc#_using_you_own_custom_pdbs[Using you own custom PDBs]. +* Increase the replication factor, e.g. from `3` to `5`. + In this case the number of allowed disruptions triples from `1` to `3` (assuming >= 5 DataNodes), reducing the time it takes by 66%. +* Increase `maxUnavailable` using the `spec.dataNodes.roleConfig.podDisruptionBudget.maxUnavailable` field as described in xref:concepts:operations/pod_disruptions.adoc[]. +* Write your own PDBs as described in xref:concepts:operations/pod_disruptions.adoc#_using_you_own_custom_pdbs[Using you own custom PDBs]. WARNING: In cases you modify or disable the default PDBs, it's your responsibility to either make sure there are enough DataNodes available or accept the risk of blocks not being available! diff --git a/docs/modules/hdfs/pages/usage-guide/operations/rack-awareness.adoc b/docs/modules/hdfs/pages/usage-guide/operations/rack-awareness.adoc index c48e9d03..d2fbeff5 100644 --- a/docs/modules/hdfs/pages/usage-guide/operations/rack-awareness.adoc +++ b/docs/modules/hdfs/pages/usage-guide/operations/rack-awareness.adoc @@ -1,7 +1,7 @@ = HDFS Rack Awareness Apache Hadoop supports a feature called Rack Awareness, which allows users to define a topology for the nodes making up a cluster. -Hadoop will then use that topology to spread out replicas of blocks in a fashion that maximizes fault tolerance. +Hadoop then uses that topology to spread out replicas of blocks in a fashion that maximizes fault tolerance. The default write path, for example, is to put replicas of a newly created block first on a different node, but within the same rack, and the second copy on a node in a remote rack. In order for this to work properly, Hadoop needs to have access to the information about the underlying infrastructure it runs on. In a Kubernetes environment, this means obtaining information from the pods or nodes of the cluster. @@ -29,4 +29,4 @@ spec: ... ---- -Internally this will be used to create a topology label consisting of the value of the node label `topology.kubernetes.io/zone` and the pod label `app.kubernetes.io/role-group`, e.g. `/eu-central-1/rg1`. +Internally this is used to create a topology label consisting of the value of the node label `topology.kubernetes.io/zone` and the pod label `app.kubernetes.io/role-group`, e.g. `/eu-central-1/rg1`. diff --git a/docs/modules/hdfs/pages/usage-guide/resources.adoc b/docs/modules/hdfs/pages/usage-guide/resources.adoc index c5681599..5cc64773 100644 --- a/docs/modules/hdfs/pages/usage-guide/resources.adoc +++ b/docs/modules/hdfs/pages/usage-guide/resources.adoc @@ -5,7 +5,7 @@ You can mount volumes where data is stored by specifying https://kubernetes.io/docs/concepts/storage/persistent-volumes[PersistentVolumeClaims] for each individual role group. -In case nothing is configured in the custom resource for a certain role group, each Pod will have one volume mount with `10Gi` capacity and storage type `Disk`: +In case nothing is configured in the custom resource for a certain role group, each Pod has one volume mount with `10Gi` capacity and storage type `Disk`: [source,yaml] ---- @@ -35,7 +35,7 @@ dataNodes: capacity: 128Gi ---- -In the above example, all DataNodes in the default group will store data (the location of `dfs.datanode.name.dir`) on a `128Gi` volume. +In the above example, all DataNodes in the default group store data (the location of `dfs.datanode.name.dir`) on a `128Gi` volume. === Multiple storage volumes @@ -61,13 +61,13 @@ dataNodes: capacity: 5Ti storageClass: premium-ssd hdfsStorageType: SSD - # The default "data" PVC will still be created. + # The default "data" PVC is still created. # If this is not desired then the count must be set to 0. data: count: 0 ---- -This will create the following PVCs: +This creates the following PVCs: 1. `my-disks-hdfs-datanode-default-0` (12Ti) 2. `my-disks-1-hdfs-datanode-default-0` (12Ti) @@ -81,7 +81,7 @@ By configuring and using a dedicated https://kubernetes.io/docs/concepts/storage ==== You might need to re-create the StatefulSet to apply the new PVC configuration because of https://github.com/kubernetes/kubernetes/issues/68737[this Kubernetes issue]. You can delete the StatefulSet using `kubectl delete statefulsets --cascade=orphan `. -The hdfs-operator will re-create the StatefulSet automatically. +The hdfs-operator recreates the StatefulSet automatically. ==== == Resource Requests diff --git a/docs/modules/hdfs/pages/usage-guide/security.adoc b/docs/modules/hdfs/pages/usage-guide/security.adoc index 25ba04c7..46470f47 100644 --- a/docs/modules/hdfs/pages/usage-guide/security.adoc +++ b/docs/modules/hdfs/pages/usage-guide/security.adoc @@ -52,7 +52,7 @@ You should get the error message `org.apache.hadoop.security.AccessControlExcept === 5. Access HDFS In case you want to access your HDFS it is recommended to start up a client Pod that connects to HDFS, rather than shelling into the namenode. -We have an https://github.com/stackabletech/hdfs-operator/blob/main/tests/templates/kuttl/kerberos/20-access-hdfs.yaml.j2[integration test] for this exact purpose, where you can see how to connect and get a valid keytab. +There is an https://github.com/stackabletech/hdfs-operator/blob/main/tests/templates/kuttl/kerberos/20-access-hdfs.yaml.j2[integration test] for this exact purpose, where you can see how to connect and get a valid keytab. == Authorization For authorization we developed https://github.com/stackabletech/hdfs-utils[hdfs-utils], which contains an OPA authorizer and group mapper. @@ -69,9 +69,9 @@ In addition to this you need a OpaCluster that serves the rego rules - this guid include::example$usage-guide/hdfs-regorules.yaml[] ---- -This rego rule is intended for demonstration purposes and allows every operation. -For a production setup you will probably need to have something much more granular. -We provide a more representative rego rule in our integration tests and in the aforementioned hdfs-utils repository. +This Rego rule is for demonstration purposes and allows all operations. +For production, you'll need a much more granular rule setup. +A more representative rego rule is available in our integration tests and in the aforementioned hdfs-utils repository. Details can be found below in the <> section. Reference the rego rule as follows in your HdfsCluster: @@ -87,19 +87,19 @@ spec: === How it works WARNING: Take all your knowledge about HDFS authorization and throw it in the bin. -The approach we are taking for our authorization paradigm departs significantly from traditional Hadoop patterns and POSIX-style permissions. +The approach taken for the authorization paradigm here departs significantly from traditional Hadoop patterns and POSIX-style permissions. In short, the current rego rules ignore the file ownership, permissions, ACLs and all other attributes files can have. All of this is state in HDFS and clashes with the infrastructure-as-code approach (IaC). -Instead, HDFS will send a request detailing who (e.g. `alice/test-hdfs-permissions.default.svc.cluster.local@CLUSTER.LOCAL`) is trying to do what (e.g. `open`, `create`, `delete` or `append`) on what file (e.g. `/foo/bar`). +Instead, HDFS sends a request detailing who (e.g. `alice/test-hdfs-permissions.default.svc.cluster.local@CLUSTER.LOCAL`) is trying to do what (e.g. `open`, `create`, `delete` or `append`) on what file (e.g. `/foo/bar`). OPA then makes a decision if this action is allowed or not. Instead of `chown`-ing a directory to a different user to assign write permissions, you should go to your IaC Git repository and add a rego rule entry specifying that the user is allowed to read and write to that directory. === Group memberships We encountered several challenges while implementing the group mapper, the most serious of which being that the `GroupMappingServiceProvider` interface only passes the `shortUsername` when https://github.com/apache/hadoop/blob/a897e745f598ef05fc0c253b2a776100e48688d2/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/GroupMappingServiceProvider.java#L45[asking for group memberships]. -This does not allow us to differentiate between e.g. `hbase/hbase-prod.prod-namespace.svc.cluster.local@CLUSTER.LOCAL` and `hbase/hbase-dev.dev-namespace.svc.cluster.local@CLUSTER.LOCAL`, as the GroupMapper will get asked for `hbase` group memberships in both cases. +This does not allow us to differentiate between e.g. `hbase/hbase-prod.prod-namespace.svc.cluster.local@CLUSTER.LOCAL` and `hbase/hbase-dev.dev-namespace.svc.cluster.local@CLUSTER.LOCAL`, as the GroupMapper gets asked for `hbase` group memberships in both cases. Users could work around this to assign unique shortNames by using https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SecureMode.html#Mapping_from_Kerberos_principals_to_OS_user_accounts[`hadoop.security.auth_to_local`]. This is however a potentially complex and error-prone process. @@ -109,7 +109,7 @@ However, this did not work, as HDFS only allows "simple usernames", https://gith Because of these issues we do not use a custom GroupMapper and only rely on the authorizer, which in turn receives a complete `UserGroupInformation` object, including the `shortUserName` and the precious full `userName`. This has the downside that the group memberships used in OPA for authorization are not known to HDFS. The implication is thus that you cannot add users to the `superuser` group, which is needed for certain administrative actions in HDFS. -We have decided that this is an acceptable approach as normal operations will not be affected. +We have decided that this is an acceptable approach as normal operations are not affected. In case you really need users to be part of the `superusers` group, you can use a configOverride on `hadoop.user.group.static.mapping.overrides` for that. [#fine-granular-rego-rules] @@ -233,7 +233,7 @@ matches the resource on [source] ---- -# Resource mentions a folder higher up the tree, which will will grant access recursively +# Resource mentions a folder higher up the tree, which grants access recursively matches_resource(file, resource) if { startswith(resource, "hdfs:dir:/") # directories need to have a trailing slash diff --git a/docs/modules/hdfs/pages/usage-guide/upgrading.adoc b/docs/modules/hdfs/pages/usage-guide/upgrading.adoc index ae10a6ad..0a369fd5 100644 --- a/docs/modules/hdfs/pages/usage-guide/upgrading.adoc +++ b/docs/modules/hdfs/pages/usage-guide/upgrading.adoc @@ -3,7 +3,8 @@ IMPORTANT: HDFS upgrades are experimental, and details may change at any time. -HDFS currently requires a manual process to upgrade. This guide will take you through an example case, upgrading an example cluster (from our xref:getting_started/index.adoc[Getting Started] guide) from HDFS 3.3.6 to 3.4.0. +HDFS currently requires a manual process to upgrade. +This guide takes you through an example case, upgrading an example cluster (from our xref:getting_started/index.adoc[Getting Started] guide) from HDFS 3.3.6 to 3.4.0. == Preparing for the worst @@ -13,7 +14,7 @@ Apache HDFS supports https://hadoop.apache.org/docs/r3.4.0/hadoop-project-dist/h Rollback:: Reverts all user data to the pre-upgrade state. Requires taking the cluster offline. Downgrade:: Downgrades the HDFS software but preserves all changes made by users. Can be performed as a rolling change, keeping the cluster online. -The Stackable Operator for HDFS supports downgrading but not rollbacks. +The Stackable operator for HDFS supports downgrading but not rollbacks. In order to downgrade, revert the `.spec.image.productVersion` field, and then proceed to xref:#finalize[finalizing] once the cluster is downgraded: @@ -74,9 +75,9 @@ hdfscluster.hdfs.stackable.tech/simple-hdfs patched Then wait until all pods have restarted, are in the Ready state, and running the new HDFS version. -NOTE: This will automatically enable the NameNodes' compatibility mode, allowing them to start despite the fsImage version mismatch. +NOTE: This automatically enables the NameNodes' compatibility mode, allowing them to start despite the fsImage version mismatch. -NOTE: Services will be upgraded in order: JournalNodes, then NameNodes, then DataNodes. +NOTE: Services are upgraded in order: JournalNodes, then NameNodes, then DataNodes. [#finalize] == Finalizing the upgrade @@ -96,7 +97,7 @@ Rolling upgrade is finalized. // We can't safely automate this, because finalize is asynchronous and doesn't tell us whether all NameNodes have even received the request to finalize. WARNING: Please ensure that all NameNodes are running and available before proceeding. -NameNodes that have not finalized yet will crash on launch when taken out of compatibility mode. +NameNodes that have not finalized yet crash on launch when taken out of compatibility mode. Finally, mark the cluster as upgraded: @@ -106,6 +107,6 @@ $ kubectl patch hdfs/simple-hdfs --subresource=status --patch '{"status": {"depl hdfscluster.hdfs.stackable.tech/simple-hdfs patched ---- -NOTE: `deployedProductVersion` is located in the _status_ subresource, which will not be modified by most graphical editors, and `kubectl` requires the `--subresource=status` flag. +NOTE: `deployedProductVersion` is located in the _status_ subresource, which is not modified by most graphical editors, and `kubectl` requires the `--subresource=status` flag. -The NameNodes will then be restarted a final time, taking them out of compatibility mode. +The NameNodes are then restarted a final time, taking them out of compatibility mode. From ab245703dbfd767054f6185e1e6745ed13923fca Mon Sep 17 00:00:00 2001 From: Felix Hennig Date: Tue, 24 Sep 2024 14:32:47 +0200 Subject: [PATCH 2/8] Update rack awareness page --- .../operations/rack-awareness.adoc | 27 +++++++++++-------- 1 file changed, 16 insertions(+), 11 deletions(-) diff --git a/docs/modules/hdfs/pages/usage-guide/operations/rack-awareness.adoc b/docs/modules/hdfs/pages/usage-guide/operations/rack-awareness.adoc index d2fbeff5..057032d6 100644 --- a/docs/modules/hdfs/pages/usage-guide/operations/rack-awareness.adoc +++ b/docs/modules/hdfs/pages/usage-guide/operations/rack-awareness.adoc @@ -1,18 +1,17 @@ = HDFS Rack Awareness +:rack-awareness-docs: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/RackAwareness.html +:hdfs-topology-provider: https://github.com/stackabletech/hdfs-topology-provider -Apache Hadoop supports a feature called Rack Awareness, which allows users to define a topology for the nodes making up a cluster. -Hadoop then uses that topology to spread out replicas of blocks in a fashion that maximizes fault tolerance. +{rack-awareness-docs}[Rack awareness] is a feature in Apache Hadoop that allows users to define a cluster's node topology. +Hadoop uses that topology to distribute block replicas in a way that maximizes fault tolerance. -The default write path, for example, is to put replicas of a newly created block first on a different node, but within the same rack, and the second copy on a node in a remote rack. -In order for this to work properly, Hadoop needs to have access to the information about the underlying infrastructure it runs on. In a Kubernetes environment, this means obtaining information from the pods or nodes of the cluster. +For example, when a new block is created, the default behavior is to place one replica on a different node within the same rack, and another on a node in a remote rack. +To do this effectively, Hadoop must access information about the underlying infrastructure. +In a Kubernetes environment, this involves retrieving data from Pods or Nodes in the cluster. -In order to enable gathering this information the Hadoop images contain https://github.com/stackabletech/hdfs-topology-provider on the classpath, which can be configured to read labels from Kubernetes objects. +== Configuring rack awareness -In the current version of the SDP this is now exposed as fully integrated functionality in the operator, and no longer needs to be configured via config overrides. - -NOTE: Prior to SDP release 24.3, it was necessary to manually deploy RBAC objects to allow the Hadoop pods access to the necessary Kubernetes objects. This ClusterRole allows the reading of pods and nodes and needs to be bound to the individual ServiceAccounts that are deployed per Hadoop cluster: this is now performed by the operator itself. - -Configuration of the tool is done by using the field `rackAwareness` under the cluster configuration: +To configure rack awareness, use the `rackAwareness` field in the cluster configuration: [source,yaml] ---- @@ -29,4 +28,10 @@ spec: ... ---- -Internally this is used to create a topology label consisting of the value of the node label `topology.kubernetes.io/zone` and the pod label `app.kubernetes.io/role-group`, e.g. `/eu-central-1/rg1`. +This creates an internal topology label by combining the values of the `topology.kubernetes.io/zone` Node label and the `app.kubernetes.io/role-group` Pod label (e.g. `/eu-central-1/rg1`). + +== How it works + +In order to enable gathering this information the Hadoop images contain the {hdfs-topology-provider}[hdfs-topology-provider] on the classpath, which can be configured to read labels from Kubernetes objects. + +The operator deploys ClusterRoles and ServicesAccounts with the relevant RBAC rules to allow the Hadoop Pod to access the necessary Kubernetes objects. From 0399fad9c1a1c3b2e75aac9db1ad7a1bfc853a52 Mon Sep 17 00:00:00 2001 From: Felix Hennig Date: Tue, 24 Sep 2024 14:42:45 +0200 Subject: [PATCH 3/8] some more wording and formatting --- docs/modules/hdfs/pages/usage-guide/fuse.adoc | 11 +++++------ .../pages/usage-guide/logging-log-aggregation.adoc | 3 +-- docs/modules/hdfs/pages/usage-guide/scaling.adoc | 2 +- 3 files changed, 7 insertions(+), 9 deletions(-) diff --git a/docs/modules/hdfs/pages/usage-guide/fuse.adoc b/docs/modules/hdfs/pages/usage-guide/fuse.adoc index 3cc426d4..41a02ea5 100644 --- a/docs/modules/hdfs/pages/usage-guide/fuse.adoc +++ b/docs/modules/hdfs/pages/usage-guide/fuse.adoc @@ -1,11 +1,9 @@ = FUSE :description: Use HDFS FUSE driver to mount HDFS filesystems into Linux environments via a Kubernetes Pod with necessary privileges and configurations. -Our images of Apache Hadoop do contain the necessary binaries and libraries to use the HDFS FUSE driver. - -FUSE is short for _Filesystem in Userspace_ and allows a user to export a filesystem into the Linux kernel, which can then be mounted. -HDFS contains a native FUSE driver/application, which means that an existing HDFS filesystem can be mounted into a Linux environment. +HDFS contains a native FUSE driver/application, enabling you to mout an existing HDFS filesystem into a Linux environment. +Stackable images of Apache Hadoop contain the necessary binaries and libraries to use the HDFS FUSE driver. To use the FUSE driver you can either copy the required files out of the image and run it on a host outside of Kubernetes or you can run it in a Pod. This Pod, however, needs some extra capabilities. @@ -39,8 +37,9 @@ spec: configMap: name: <2> ---- -<1> Ideally use the same version your HDFS is using. FUSE is baked in to our images as of SDP 23.11. -<2> This needs to be a reference to a discovery ConfigMap as written by our HDFS operator. +<1> Ideally use the same version your HDFS is using. + Stackable HDFS images contain the FUSE driver since 23.11. +<2> This needs to be a reference to a discovery ConfigMap as written by the HDFS operator. [TIP] .Privileged Pods diff --git a/docs/modules/hdfs/pages/usage-guide/logging-log-aggregation.adoc b/docs/modules/hdfs/pages/usage-guide/logging-log-aggregation.adoc index 9a7c44f3..c4c3c208 100644 --- a/docs/modules/hdfs/pages/usage-guide/logging-log-aggregation.adoc +++ b/docs/modules/hdfs/pages/usage-guide/logging-log-aggregation.adoc @@ -22,5 +22,4 @@ spec: enableVectorAgent: true ---- -Further information on how to configure logging, can be found in -xref:concepts:logging.adoc[]. +Further information on how to configure logging, can be found in xref:concepts:logging.adoc[]. diff --git a/docs/modules/hdfs/pages/usage-guide/scaling.adoc b/docs/modules/hdfs/pages/usage-guide/scaling.adoc index df1d2318..752f1874 100644 --- a/docs/modules/hdfs/pages/usage-guide/scaling.adoc +++ b/docs/modules/hdfs/pages/usage-guide/scaling.adoc @@ -1,4 +1,4 @@ = Scaling :description: When scaling namenodes up, make sure to increase the replica count only by one and not more nodes at once. -When scaling namenodes up, make sure to increase the replica count only by one and not more nodes at once. +When scaling name nodes up, make sure to increase the replica count only by one and not more nodes at once. From ba8f0d0eb1ebc6c21132e2d5903ee6d6523e01d9 Mon Sep 17 00:00:00 2001 From: Felix Hennig Date: Tue, 24 Sep 2024 15:52:24 +0200 Subject: [PATCH 4/8] some more wording and formatting --- .../usage-guide/configuration-environment-overrides.adoc | 2 +- .../hdfs/pages/usage-guide/operations/pod-disruptions.adoc | 4 ++-- docs/modules/hdfs/pages/usage-guide/security.adoc | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/modules/hdfs/pages/usage-guide/configuration-environment-overrides.adoc b/docs/modules/hdfs/pages/usage-guide/configuration-environment-overrides.adoc index e3d8fca6..0dd285e9 100644 --- a/docs/modules/hdfs/pages/usage-guide/configuration-environment-overrides.adoc +++ b/docs/modules/hdfs/pages/usage-guide/configuration-environment-overrides.adoc @@ -53,7 +53,7 @@ nameNodes: All override property values must be strings. The properties are formatted and escaped correctly into the XML file. -For a full list of configuration options we refer to the Apache Hdfs documentation for https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml[hdfs-site.xml] and https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/core-default.xml[core-site.xml]. +For a full list of configuration options refer to the Apache Hdfs documentation for https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml[hdfs-site.xml] and https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/core-default.xml[core-site.xml]. === The security.properties file diff --git a/docs/modules/hdfs/pages/usage-guide/operations/pod-disruptions.adoc b/docs/modules/hdfs/pages/usage-guide/operations/pod-disruptions.adoc index 36e1859a..f055b323 100644 --- a/docs/modules/hdfs/pages/usage-guide/operations/pod-disruptions.adoc +++ b/docs/modules/hdfs/pages/usage-guide/operations/pod-disruptions.adoc @@ -3,7 +3,7 @@ You can configure the permitted Pod disruptions for HDFS nodes as described in xref:concepts:operations/pod_disruptions.adoc[]. -Unless you configure something else or disable our PodDisruptionBudgets (PDBs), the operator write the following PDBs: +Unless you configure something else or disable the default PodDisruptionBudgets (PDBs), the operator writes the following PDBs: == JournalNodes Only a single JournalNode is allowed to be offline at any given time, regardless of the number of replicas or `roleGroups`. @@ -20,7 +20,7 @@ The default replication factor is `3` - which can be configured using `spec.clus When you have a replication of `3`, you can safely take down 2 DataNodes, as there is always a third DataNode holding a copy of each block currently assigned to one of the unavailable DataNodes. However, you need to be aware that you are now down to a single point of failure -- the last of three replicas! -Taking this into consideration, our operator uses the following algorithm to determine the maximum number of DataNodes allowed to be unavailable at the same time: +Taking this into consideration, the operator uses the following algorithm to determine the maximum number of DataNodes allowed to be unavailable at the same time: `num_datanodes` is the number of DataNodes in the HDFS cluster, summed over all `roleGroups`. diff --git a/docs/modules/hdfs/pages/usage-guide/security.adoc b/docs/modules/hdfs/pages/usage-guide/security.adoc index 46470f47..10d7a60e 100644 --- a/docs/modules/hdfs/pages/usage-guide/security.adoc +++ b/docs/modules/hdfs/pages/usage-guide/security.adoc @@ -56,7 +56,7 @@ There is an https://github.com/stackabletech/hdfs-operator/blob/main/tests/templ == Authorization For authorization we developed https://github.com/stackabletech/hdfs-utils[hdfs-utils], which contains an OPA authorizer and group mapper. -This matches our general xref:concepts:opa.adoc[] mechanisms. +This matches the general xref:concepts:opa.adoc[] mechanisms. IMPORTANT: It is recommended to enable Kerberos when doing Authorization, as otherwise you don't have any security measures at all. There still might be cases where you want authorization on top of a cluster without authentication, as you don't want to accidentally drop files and therefore use different users for different use-cases. From 90e179f4c6880a5de45d8e8bf107332750180572 Mon Sep 17 00:00:00 2001 From: Razvan-Daniel Mihai <84674+razvan@users.noreply.github.com> Date: Fri, 27 Sep 2024 10:46:35 -0400 Subject: [PATCH 5/8] Update docs/modules/hdfs/pages/getting_started/installation.adoc --- docs/modules/hdfs/pages/getting_started/installation.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/modules/hdfs/pages/getting_started/installation.adoc b/docs/modules/hdfs/pages/getting_started/installation.adoc index d62421f2..b63361a4 100644 --- a/docs/modules/hdfs/pages/getting_started/installation.adoc +++ b/docs/modules/hdfs/pages/getting_started/installation.adoc @@ -6,7 +6,7 @@ Install the Stackable HDFS operator and its dependency, the Zookeeper operator, commons, secret and listener operators which are required by all Stackable operators. There are multiple ways to install the Stackable operators. -xref:management:stackablectl:index.adoc[] is the preferred way but Helm is also supported. +xref:management:stackablectl:index.adoc[] is the preferred way, but Helm is also supported. OpenShift users may prefer installing the operator from the RedHat Certified Operator catalog using the OpenShift web console. [tabs] From 0763ed3447e200b368e09d3546324a6c12bbc8b6 Mon Sep 17 00:00:00 2001 From: Razvan-Daniel Mihai <84674+razvan@users.noreply.github.com> Date: Fri, 27 Sep 2024 10:46:46 -0400 Subject: [PATCH 6/8] Update docs/modules/hdfs/pages/usage-guide/operations/graceful-shutdown.adoc --- .../hdfs/pages/usage-guide/operations/graceful-shutdown.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/modules/hdfs/pages/usage-guide/operations/graceful-shutdown.adoc b/docs/modules/hdfs/pages/usage-guide/operations/graceful-shutdown.adoc index 3f933b31..0fcce4ba 100644 --- a/docs/modules/hdfs/pages/usage-guide/operations/graceful-shutdown.adoc +++ b/docs/modules/hdfs/pages/usage-guide/operations/graceful-shutdown.adoc @@ -8,7 +8,7 @@ As a default, JournalNodes have `15 minutes` to shut down gracefully. The JournalNode process receives a `SIGTERM` signal when Kubernetes wants to terminate the Pod. It logs the received signal as shown in the log below and initiate a graceful shutdown. -After the graceful shutdown timeout runs out, and the process still didn't exit, Kubernetes issues a `SIGKILL` signal. +After the graceful shutdown timeout runs out, and the process is still running, Kubernetes issues a `SIGKILL` signal. https://github.com/apache/hadoop/blob/a585a73c3e02ac62350c136643a5e7f6095a3dbb/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/qjournal/server/JournalNode.java#L272[This] is the relevant code that gets executed in the JournalNodes as of HDFS version `3.3.4`. From 953aabcf75927791900d23c3860b2c85f24d8088 Mon Sep 17 00:00:00 2001 From: Razvan-Daniel Mihai <84674+razvan@users.noreply.github.com> Date: Fri, 27 Sep 2024 10:47:14 -0400 Subject: [PATCH 7/8] Update docs/modules/hdfs/pages/usage-guide/operations/pod-disruptions.adoc --- .../hdfs/pages/usage-guide/operations/pod-disruptions.adoc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/modules/hdfs/pages/usage-guide/operations/pod-disruptions.adoc b/docs/modules/hdfs/pages/usage-guide/operations/pod-disruptions.adoc index f055b323..5cf41742 100644 --- a/docs/modules/hdfs/pages/usage-guide/operations/pod-disruptions.adoc +++ b/docs/modules/hdfs/pages/usage-guide/operations/pod-disruptions.adoc @@ -17,8 +17,8 @@ HDFS stores your blocks on the DataNodes. Every block can be replicated multiple times (to multiple DataNodes) to ensure maximum availability. The default replication factor is `3` - which can be configured using `spec.clusterConfig.dfsReplication`. However, it is also possible to change the replication factor for a specific file or directory to something other than the cluster default. -When you have a replication of `3`, you can safely take down 2 DataNodes, as there is always a third DataNode holding a copy of each block currently assigned to one of the unavailable DataNodes. -However, you need to be aware that you are now down to a single point of failure -- the last of three replicas! +With a replication of `3`, at most 2 data nodes may be down, as the third one is holding a copy of each block currently assigned to the unavailable nodes. +However, the last data node running is now a single point of failure -- the last of three replicas! Taking this into consideration, the operator uses the following algorithm to determine the maximum number of DataNodes allowed to be unavailable at the same time: From d08799610c7d890330e19fd62a5c47417277efcc Mon Sep 17 00:00:00 2001 From: Razvan-Daniel Mihai <84674+razvan@users.noreply.github.com> Date: Fri, 27 Sep 2024 10:47:29 -0400 Subject: [PATCH 8/8] Update docs/modules/hdfs/pages/usage-guide/resources.adoc --- docs/modules/hdfs/pages/usage-guide/resources.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/modules/hdfs/pages/usage-guide/resources.adoc b/docs/modules/hdfs/pages/usage-guide/resources.adoc index 5cc64773..b2d451f9 100644 --- a/docs/modules/hdfs/pages/usage-guide/resources.adoc +++ b/docs/modules/hdfs/pages/usage-guide/resources.adoc @@ -5,7 +5,7 @@ You can mount volumes where data is stored by specifying https://kubernetes.io/docs/concepts/storage/persistent-volumes[PersistentVolumeClaims] for each individual role group. -In case nothing is configured in the custom resource for a certain role group, each Pod has one volume mount with `10Gi` capacity and storage type `Disk`: +By default, each Pod has one volume mount with `10Gi` capacity and storage type `Disk`: [source,yaml] ----