Skip to content

Commit 6201239

Browse files
authored
docs: add GitHub and feature tracker links (#511)
* ~ * ~ * typo * fix CRD links * fix CRD links
1 parent d97652e commit 6201239

File tree

1 file changed

+36
-23
lines changed

1 file changed

+36
-23
lines changed

docs/modules/hdfs/pages/index.adoc

Lines changed: 36 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1,33 +1,41 @@
11
= Stackable Operator for Apache HDFS
22
:description: The Stackable Operator for Apache HDFS is a Kubernetes operator that can manage Apache HDFS clusters. Learn about its features, resources, dependencies and demos, and see the list of supported HDFS versions.
3-
:keywords: Stackable Operator, Hadoop, Apache HDFS, Kubernetes, k8s, operator, engineer, big data, metadata, storage, cluster, distributed storage
4-
5-
The Stackable Operator for https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html[Apache HDFS]
6-
(Hadoop Distributed File System) is used to set up HFDS in high-availability mode. HDFS is a distributed file system
7-
designed to store and manage massive amounts of data across multiple machines in a fault-tolerant manner. The Operator
8-
depends on the xref:zookeeper:index.adoc[] to operate a ZooKeeper cluster to coordinate the active and standby NameNodes.
3+
:keywords: Stackable Operator, Hadoop, Apache HDFS, Kubernetes, k8s, operator, big data, metadata, storage, cluster, distributed storage
4+
:hdfs-docs: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html
5+
:github: https://github.com/stackabletech/hdfs-operator/
6+
:crd: {crd-docs-base-url}/hdfs-operator/{crd-docs-version}/
7+
:crd-hdfscluster: {crd-docs}/hdfs.stackable.tech/hdfscluster/v1alpha1/
8+
:feature-tracker: https://features.stackable.tech/unified
9+
10+
[.link-bar]
11+
* {github}[GitHub {external-link-icon}^]
12+
* {feature-tracker}[Feature Tracker {external-link-icon}^]
13+
* {crd}[CRD documentation {external-link-icon}^]
14+
15+
The Stackable operator for {hdfs-docs}[Apache HDFS] (Hadoop Distributed File System) is used to set up HFDS in high-availability mode.
16+
HDFS is a distributed file system designed to store and manage massive amounts of data across multiple machines in a fault-tolerant manner.
17+
The operator depends on the xref:zookeeper:index.adoc[] to operate a ZooKeeper cluster to coordinate the active and standby NameNodes.
918

1019
== Getting started
1120

1221
Follow the xref:getting_started/index.adoc[Getting started guide] which will guide you through installing the Stackable
13-
HDFS and ZooKeeper Operators, setting up ZooKeeper and HDFS and writing a file to HDFS to verify that everything is set
22+
HDFS and ZooKeeper operators, setting up ZooKeeper and HDFS and writing a file to HDFS to verify that everything is set
1423
up correctly.
1524

1625
Afterwards you can consult the xref:usage-guide/index.adoc[] to learn more about tailoring your HDFS configuration to
1726
your needs, or have a look at the <<demos, demos>> for some example setups.
1827

1928
== Operator model
2029

21-
The Operator manages the _HdfsCluster_ custom resource. The cluster implements three
22-
xref:concepts:roles-and-role-groups.adoc[roles]:
30+
The operator manages the _HdfsCluster_ custom resource.
31+
The cluster implements three xref:concepts:roles-and-role-groups.adoc[roles]:
2332

2433
* DataNode - responsible for storing the actual data.
2534
* JournalNode - responsible for keeping track of HDFS blocks and used to perform failovers in case the active NameNode
2635
fails. For details see: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html
2736
* NameNode - responsible for keeping track of HDFS blocks and providing access to the data.
2837

29-
30-
image::hdfs_overview.drawio.svg[A diagram depicting the Kubernetes resources created by the Stackable Operator for Apache HDFS]
38+
image::hdfs_overview.drawio.svg[A diagram depicting the Kubernetes resources created by the Stackable operator for Apache HDFS]
3139

3240
The operator creates the following K8S objects per role group defined in the custom resource.
3341

@@ -39,36 +47,41 @@ The operator creates the following K8S objects per role group defined in the cus
3947
In addition, a `NodePort` service is created for each pod labeled with `hdfs.stackable.tech/pod-service=true` that
4048
exposes all container ports to the outside world (from the perspective of K8S).
4149

42-
In the custom resource you can specify the number of replicas per role group (NameNode, DataNode or JournalNode). A
43-
minimal working configuration requires:
50+
In the custom resource you can specify the number of replicas per role group (NameNode, DataNode or JournalNode).
51+
A minimal working configuration requires:
4452

4553
* 2 NameNodes (HA)
4654
* 1 JournalNode
4755
* 1 DataNode (should match at least the `clusterConfig.dfsReplication` factor)
4856

49-
The Operator creates a xref:concepts:service_discovery.adoc[service discovery ConfigMap] for the HDFS instance. The
50-
discovery ConfigMap contains the `core-site.xml` file and the `hdfs-site.xml` file.
57+
The operator creates a xref:concepts:service_discovery.adoc[service discovery ConfigMap] for the HDFS instance.
58+
The discovery ConfigMap contains the `core-site.xml` file and the `hdfs-site.xml` file.
5159

5260
== Dependencies
5361

54-
HDFS depends on ZooKeeper for coordination between nodes. You can run a ZooKeeper cluster with the
55-
xref:zookeeper:index.adoc[]. Additionally, the xref:commons-operator:index.adoc[],
56-
xref:secret-operator:index.adoc[] and xref:listener-operator:index.adoc[] are needed.
62+
HDFS depends on Apache ZooKeeper for coordination between nodes.
63+
You can run a ZooKeeper cluster with the xref:zookeeper:index.adoc[].
64+
Additionally, the xref:commons-operator:index.adoc[], xref:secret-operator:index.adoc[] and xref:listener-operator:index.adoc[] are required.
5765

5866
== [[demos]]Demos
5967

6068
Two demos that use HDFS are available.
6169

62-
**xref:demos:hbase-hdfs-load-cycling-data.adoc[]** loads a dataset of cycling data from S3 into HDFS and then uses HBase
63-
to analyze the data.
70+
**xref:demos:hbase-hdfs-load-cycling-data.adoc[]** loads a dataset of cycling data from S3 into HDFS and then uses HBase to analyze the data.
6471

65-
**xref:demos:jupyterhub-pyspark-hdfs-anomaly-detection-taxi-data.adoc[]** showcases the integration between HDFS and
66-
Jupyter. New York Taxi data is stored in HDFS and analyzed in a Jupyter notebook.
72+
**xref:demos:jupyterhub-pyspark-hdfs-anomaly-detection-taxi-data.adoc[]** showcases the integration between HDFS and Jupyter.
73+
New York Taxi data is stored in HDFS and analyzed in a Jupyter notebook.
6774

6875
== Supported versions
6976

70-
The Stackable Operator for Apache HDFS currently supports the HDFS versions listed below.
77+
The Stackable operator for Apache HDFS currently supports the HDFS versions listed below.
7178
To use a specific HDFS version in your HdfsCluster, you have to specify an image - this is explained in the xref:concepts:product-image-selection.adoc[] documentation.
7279
The operator also supports running images from a custom registry or running entirely customized images; both of these cases are explained under xref:concepts:product-image-selection.adoc[] as well.
7380

7481
include::partial$supported-versions.adoc[]
82+
83+
== Useful links
84+
85+
* The {github}[hdfs-operator {external-link-icon}^] GitHub repository
86+
* The operator feature overview in the {feature-tracker}[feature tracker {external-link-icon}^]
87+
* The {crd-hdfscluster}[HdfsCluster {external-link-icon}^] CRD documentation

0 commit comments

Comments
 (0)