Skip to content

Commit cbb1255

Browse files
authored
Add descriptions (#580)
1 parent b6a95ff commit cbb1255

14 files changed

+69
-36
lines changed

docs/modules/hdfs/pages/getting_started/first_steps.adoc

Lines changed: 17 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
= First steps
2+
:description: Deploy and verify an HDFS cluster with Stackable by setting up Zookeeper and HDFS components, then test file operations using WebHDFS API.
23

3-
Once you have followed the steps in the xref:getting_started/installation.adoc[] section to install the operator and its dependencies, you will now deploy an HDFS cluster and its dependencies. Afterward, you can <<_verify_that_it_works, verify that it works>> by creating, verifying and deleting a test file in HDFS.
4+
Once you have followed the steps in the xref:getting_started/installation.adoc[] section to install the operator and its dependencies, you will now deploy an HDFS cluster and its dependencies.
5+
Afterward, you can <<_verify_that_it_works, verify that it works>> by creating, verifying and deleting a test file in HDFS.
46

57
== Setup
68

@@ -11,7 +13,8 @@ To deploy a Zookeeper cluster create one file called `zk.yaml`:
1113
[source,yaml]
1214
include::example$getting_started/zk.yaml[]
1315

14-
We also need to define a ZNode that will be used by the HDFS cluster to reference Zookeeper. Create another file called `znode.yaml`:
16+
We also need to define a ZNode that will be used by the HDFS cluster to reference Zookeeper.
17+
Create another file called `znode.yaml`:
1518

1619
[source,yaml]
1720
include::example$getting_started/znode.yaml[]
@@ -28,7 +31,8 @@ include::example$getting_started/getting_started.sh[tag=watch-zk-rollout]
2831

2932
=== HDFS
3033

31-
An HDFS cluster has three components: the `namenode`, the `datanode` and the `journalnode`. Create a file named `hdfs.yaml` defining 2 `namenodes` and one `datanode` and `journalnode` each:
34+
An HDFS cluster has three components: the `namenode`, the `datanode` and the `journalnode`.
35+
Create a file named `hdfs.yaml` defining 2 `namenodes` and one `datanode` and `journalnode` each:
3236

3337
[source,yaml]
3438
----
@@ -37,10 +41,12 @@ include::example$getting_started/hdfs.yaml[]
3741

3842
Where:
3943

40-
- `metadata.name` contains the name of the HDFS cluster
41-
- the HDFS version in the Docker image provided by Stackable must be set in `spec.image.productVersion`
44+
* `metadata.name` contains the name of the HDFS cluster
45+
* the HDFS version in the Docker image provided by Stackable must be set in `spec.image.productVersion`
4246

43-
NOTE: Please note that the version you need to specify for `spec.image.productVersion` is the desired version of Apache HDFS. You can optionally specify the `spec.image.stackableVersion` to a certain release like `23.11.0` but it is recommended to leave it out and use the default provided by the operator. For a list of available versions please check our https://repo.stackable.tech/#browse/browse:docker:v2%2Fstackable%2Fhadoop%2Ftags[image registry].
47+
NOTE: Please note that the version you need to specify for `spec.image.productVersion` is the desired version of Apache HDFS.
48+
You can optionally specify the `spec.image.stackableVersion` to a certain release like `24.7.0` but it is recommended to leave it out and use the default provided by the operator.
49+
For a list of available versions please check our https://repo.stackable.tech/#browse/browse:docker:v2%2Fstackable%2Fhadoop%2Ftags[image registry].
4450
It should generally be safe to simply use the latest image version that is available.
4551

4652
Create the actual HDFS cluster by applying the file:
@@ -57,7 +63,9 @@ include::example$getting_started/getting_started.sh[tag=watch-hdfs-rollout]
5763

5864
== Verify that it works
5965

60-
To test the cluster you can create a new file, check its status and then delete it. We will execute these actions from within a helper pod. Create a file called `webhdfs.yaml`:
66+
To test the cluster operation, create a new file, check its status and then delete it.
67+
You can execute these actions from within a helper Pod.
68+
Create a file called `webhdfs.yaml`:
6169

6270
[source,yaml]
6371
----
@@ -75,7 +83,8 @@ To begin with the cluster should be empty: this can be verified by listing all
7583
[source]
7684
include::example$getting_started/getting_started.sh[tag=file-status]
7785

78-
Creating a file in HDFS using the https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#Create_and_Write_to_a_File[Webhdfs API] requires a two-step `PUT` (the reason for having a two-step create/append is to prevent clients from sending out data before the redirect). First, create a file with some text in it called `testdata.txt` and copy it to the `tmp` directory on the helper pod:
86+
Creating a file in HDFS using the https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html#Create_and_Write_to_a_File[Webhdfs API] requires a two-step `PUT` (the reason for having a two-step create/append is to prevent clients from sending out data before the redirect).
87+
First, create a file with some text in it called `testdata.txt` and copy it to the `tmp` directory on the helper pod:
7988

8089
[source]
8190
include::example$getting_started/getting_started.sh[tag=copy-file]

docs/modules/hdfs/pages/getting_started/index.adoc

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
= Getting started
2+
:description: Start with HDFS using the Stackable Operator. Install the Operator, set up your HDFS cluster, and verify its operation with this guide.
23

3-
This guide will get you started with HDFS using the Stackable Operator. It will guide you through the installation of the Operator and its dependencies, setting up your first HDFS cluster and verifying its operation.
4+
This guide will get you started with HDFS using the Stackable Operator.
5+
It will guide you through the installation of the Operator and its dependencies, setting up your first HDFS cluster and verifying its operation.
46

57
== Prerequisites
68

docs/modules/hdfs/pages/getting_started/installation.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
= Installation
2+
:description: Install the Stackable HDFS operator and dependencies using stackablectl or Helm. Follow steps for setup and verification in Kubernetes.
23

34
On this page you will install the Stackable HDFS operator and its dependency, the Zookeeper operator, as well as the
45
commons, secret and listener operators which are required by all Stackable operators.

docs/modules/hdfs/pages/index.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
= Stackable Operator for Apache HDFS
2-
:description: The Stackable Operator for Apache HDFS is a Kubernetes operator that can manage Apache HDFS clusters. Learn about its features, resources, dependencies and demos, and see the list of supported HDFS versions.
2+
:description: Manage Apache HDFS with the Stackable Operator for Kubernetes. Set up clusters, configure roles, and explore demos and supported versions.
33
:keywords: Stackable Operator, Hadoop, Apache HDFS, Kubernetes, k8s, operator, big data, metadata, storage, cluster, distributed storage
44
:hdfs-docs: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html
55
:github: https://github.com/stackabletech/hdfs-operator/

docs/modules/hdfs/pages/usage-guide/configuration-environment-overrides.adoc

Lines changed: 20 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,22 @@
1-
21
= Configuration & Environment Overrides
2+
:description: Override HDFS config properties and environment variables per role or role group. Manage settings like DNS cache and environment variables efficiently.
3+
:java-security-overview: https://docs.oracle.com/en/java/javase/11/security/java-security-overview1.html
34

45
The cluster definition also supports overriding configuration properties and environment variables, either per role or per role group, where the more specific override (role group) has precedence over the less specific one (role).
56

6-
IMPORTANT: Overriding certain properties can lead to faulty clusters. In general this means, do not change ports, hostnames or properties related to data dirs, high-availability or security.
7+
IMPORTANT: Overriding certain properties can lead to faulty clusters.
8+
In general this means, do not change ports, hostnames or properties related to data dirs, high-availability or security.
79

810
== Configuration Properties
911

1012
For a role or role group, at the same level of `config`, you can specify `configOverrides` for the following files:
1113

12-
- `hdfs-site.xml`
13-
- `core-site.xml`
14-
- `hadoop-policy.xml`
15-
- `ssl-server.xml`
16-
- `ssl-client.xml`
17-
- `security.properties`
18-
14+
* `hdfs-site.xml`
15+
* `core-site.xml`
16+
* `hadoop-policy.xml`
17+
* `ssl-server.xml`
18+
* `ssl-client.xml`
19+
* `security.properties`
1920

2021
For example, if you want to set additional properties on the namenode servers, adapt the `nameNodes` section of the cluster resource like so:
2122

@@ -51,13 +52,17 @@ nameNodes:
5152

5253
All override property values must be strings. The properties will be formatted and escaped correctly into the XML file.
5354

54-
For a full list of configuration options we refer to the Apache Hdfs documentation for https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml[hdfs-site.xml] and https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/core-default.xml[core-site.xml]
55+
For a full list of configuration options we refer to the Apache Hdfs documentation for https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml[hdfs-site.xml] and https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/core-default.xml[core-site.xml].
5556

5657
=== The security.properties file
5758

58-
The `security.properties` file is used to configure JVM security properties. It is very seldom that users need to tweak any of these, but there is one use-case that stands out, and that users need to be aware of: the JVM DNS cache.
59+
The `security.properties` file is used to configure JVM security properties.
60+
It is very seldom that users need to tweak any of these, but there is one use-case that stands out, and that users need to be aware of: the JVM DNS cache.
5961

60-
The JVM manages it's own cache of successfully resolved host names as well as a cache of host names that cannot be resolved. Some products of the Stackable platform are very sensible to the contents of these caches and their performance is heavily affected by them. As of version 3.3.4 HDFS performs poorly if the positive cache is disabled. To cache resolved host names, and thus speeding up Hbase queries you can configure the TTL of entries in the positive cache like this:
62+
The JVM manages it's own cache of successfully resolved host names as well as a cache of host names that cannot be resolved.
63+
Some products of the Stackable platform are very sensible to the contents of these caches and their performance is heavily affected by them.
64+
As of version 3.3.4 HDFS performs poorly if the positive cache is disabled.
65+
To cache resolved host names, and thus speeding up Hbase queries you can configure the TTL of entries in the positive cache like this:
6166

6267
[source,yaml]
6368
----
@@ -80,12 +85,13 @@ The JVM manages it's own cache of successfully resolved host names as well as a
8085

8186
NOTE: The operator configures DNS caching by default as shown in the example above.
8287

83-
For details on the JVM security see https://docs.oracle.com/en/java/javase/11/security/java-security-overview1.html
88+
For details on the JVM security consult the {java-security-overview}[Java Security overview documentation].
8489

8590

8691
== Environment Variables
8792

88-
In a similar fashion, environment variables can be (over)written. For example per role group:
93+
In a similar fashion, environment variables can be (over)written.
94+
For example per role group:
8995

9096
[source,yaml]
9197
----

docs/modules/hdfs/pages/usage-guide/fuse.adoc

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,15 @@
11
= FUSE
2+
:description: Use HDFS FUSE driver to mount HDFS filesystems into Linux environments via a Kubernetes Pod with necessary privileges and configurations.
23

34
Our images of Apache Hadoop do contain the necessary binaries and libraries to use the HDFS FUSE driver.
45

56
FUSE is short for _Filesystem in Userspace_ and allows a user to export a filesystem into the Linux kernel, which can then be mounted.
67
HDFS contains a native FUSE driver/application, which means that an existing HDFS filesystem can be mounted into a Linux environment.
78

89
To use the FUSE driver you can either copy the required files out of the image and run it on a host outside of Kubernetes or you can run it in a Pod.
9-
This pod, however, will need some extra capabilities.
10+
This Pod, however, will need some extra capabilities.
1011

11-
This is an example pod that will work _as long as the host system that is running the kubelet does support FUSE_:
12+
This is an example Pod that will work _as long as the host system that is running the kubelet does support FUSE_:
1213

1314
[source,yaml]
1415
----
Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,7 @@
11
= Usage guide
2+
:description: Learn to configure and use the Stackable Operator for Apache HDFS. Ensure basic setup knowledge from the Getting Started guide before proceeding.
23
:page-aliases: ROOT:usage.adoc
34

4-
This Section will help you to use and configure the Stackable Operator for Apache HDFS in various ways. You should already be familiar with how to set up a basic instance. Follow the xref:getting_started/index.adoc[] guide to learn how to set up a basic instance with all the required dependencies (for example ZooKeeper).
5+
This Section will help you to use and configure the Stackable Operator for Apache HDFS in various ways.
6+
You should already be familiar with how to set up a basic instance.
7+
Follow the xref:getting_started/index.adoc[] guide to learn how to set up a basic instance with all the required dependencies (for example ZooKeeper).

docs/modules/hdfs/pages/usage-guide/listenerclass.adoc

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
= Service exposition with ListenerClasses
2+
:description: Configure HDFS service exposure using ListenerClasses to control internal and external access for DataNodes and NameNodes.
23

3-
The operator deploys a xref:listener-operator:listener.adoc[Listener] for each DataNode and NameNode pod. They both default to only being accessible from within the Kubernetes cluster, but this can be changed by setting `.spec.{data,name}Nodes.config.listenerClass`.
4+
The operator deploys a xref:listener-operator:listener.adoc[Listener] for each DataNode and NameNode pod.
5+
They both default to only being accessible from within the Kubernetes cluster, but this can be changed by setting `.spec.{data,name}Nodes.config.listenerClass`.
46

57
Note that JournalNodes are not accessible from outside the Kubernetes cluster.
68

docs/modules/hdfs/pages/usage-guide/logging-log-aggregation.adoc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
= Logging & log aggregation
2+
:description: The logs can be forwarded to a Vector log aggregator by providing a discovery ConfigMap for the aggregator and by enabling the log agent.
23

3-
The logs can be forwarded to a Vector log aggregator by providing a discovery
4-
ConfigMap for the aggregator and by enabling the log agent:
4+
The logs can be forwarded to a Vector log aggregator by providing a discovery ConfigMap for the aggregator and by enabling the log agent:
55

66
[source,yaml]
77
----
Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,11 @@
11
= Monitoring
2+
:description: The HDFS cluster can be monitored with Prometheus from inside or outside the K8S cluster.
23

34
The cluster can be monitored with Prometheus from inside or outside the K8S cluster.
45

5-
All services (with the exception of the Zookeeper daemon on the node names) run with the JMX exporter agent enabled and expose metrics on the `metrics` port. This port is available from the container level up to the NodePort services.
6+
All services (with the exception of the Zookeeper daemon on the node names) run with the JMX exporter agent enabled and expose metrics on the `metrics` port.
7+
This port is available from the container level up to the NodePort services.
68

7-
The metrics endpoints are also used as liveliness probes by K8S.
9+
The metrics endpoints are also used as liveliness probes by Kubernetes.
810

911
See xref:operators:monitoring.adoc[] for more details.

docs/modules/hdfs/pages/usage-guide/resources.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
= Resources
2+
:description: Configure HDFS storage with PersistentVolumeClaims for custom data volumes and multiple disk types. Set resource requests for HA setups in Kubernetes.
23

34
== Storage for data volumes
45

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
11
= Scaling
2+
:description: When scaling namenodes up, make sure to increase the replica count only by one and not more nodes at once.
23

34
When scaling namenodes up, make sure to increase the replica count only by one and not more nodes at once.

docs/modules/hdfs/pages/usage-guide/security.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
= Security
2+
:description: Secure HDFS with Kerberos authentication and OPA authorization. Use tlsSecretClass for TLS and configure fine-grained access with Rego rules.
23

34
== Authentication
45
Currently the only supported authentication mechanism is Kerberos, which is disabled by default.

docs/modules/hdfs/pages/usage-guide/upgrading.adoc

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,14 @@
11
= Upgrading HDFS
2+
:description: Upgrade HDFS with the Stackable Operator: Prepare, initiate, and finalize upgrades. Rollback and downgrade supported.
23

3-
IMPORTANT: HDFS upgrades are experimental, and details may change at any time
4+
IMPORTANT: HDFS upgrades are experimental, and details may change at any time.
45

56
HDFS currently requires a manual process to upgrade. This guide will take you through an example case, upgrading an example cluster (from our xref:getting_started/index.adoc[Getting Started] guide) from HDFS 3.3.6 to 3.4.0.
67

78
== Preparing for the worst
89

9-
Upgrades can fail, and it is important to prepare for when that happens. Apache HDFS supports https://hadoop.apache.org/docs/r3.4.0/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html#Downgrade_and_Rollback[two ways to revert an upgrade]:
10+
Upgrades can fail, and it is important to prepare for when that happens.
11+
Apache HDFS supports https://hadoop.apache.org/docs/r3.4.0/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html#Downgrade_and_Rollback[two ways to revert an upgrade]:
1012

1113
Rollback:: Reverts all user data to the pre-upgrade state. Requires taking the cluster offline.
1214
Downgrade:: Downgrades the HDFS software but preserves all changes made by users. Can be performed as a rolling change, keeping the cluster online.
@@ -23,7 +25,8 @@ hdfscluster.hdfs.stackable.tech/simple-hdfs patched
2325

2426
== Preparing HDFS
2527

26-
HDFS must be configured to initiate the upgrade process. To do this, put the cluster into upgrade mode by running the following commands in an HDFS superuser environment
28+
HDFS must be configured to initiate the upgrade process.
29+
To do this, put the cluster into upgrade mode by running the following commands in an HDFS superuser environment
2730
(either a client configured with a superuser account, or from inside a NameNode pod):
2831

2932
// This could be automated by the operator, but dfsadmin does not have good machine-readable output.
@@ -92,7 +95,8 @@ Rolling upgrade is finalized.
9295

9396
// We can't safely automate this, because finalize is asynchronous and doesn't tell us whether all NameNodes have even received the request to finalize.
9497

95-
WARNING: Please ensure that all NameNodes are running and available before proceeding. NameNodes that have not finalized yet will crash on launch when taken out of compatibility mode.
98+
WARNING: Please ensure that all NameNodes are running and available before proceeding.
99+
NameNodes that have not finalized yet will crash on launch when taken out of compatibility mode.
96100

97101
Finally, mark the cluster as upgraded:
98102

0 commit comments

Comments
 (0)