You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
:description: The Stackable Operator for Apache HDFS is a Kubernetes operator that can manage Apache HDFS clusters. Learn about its features, resources, dependencies and demos, and see the list of supported HDFS versions.
The Stackable operator for {hdfs-docs}[Apache HDFS] (Hadoop Distributed File System) is used to set up HFDS in high-availability mode.
16
+
HDFS is a distributed file system designed to store and manage massive amounts of data across multiple machines in a fault-tolerant manner.
17
+
The operator depends on the xref:zookeeper:index.adoc[] to operate a ZooKeeper cluster to coordinate the active and standby NameNodes.
9
18
10
19
== Getting started
11
20
12
21
Follow the xref:getting_started/index.adoc[Getting started guide] which will guide you through installing the Stackable
13
-
HDFS and ZooKeeper Operators, setting up ZooKeeper and HDFS and writing a file to HDFS to verify that everything is set
22
+
HDFS and ZooKeeper operators, setting up ZooKeeper and HDFS and writing a file to HDFS to verify that everything is set
14
23
up correctly.
15
24
16
25
Afterwards you can consult the xref:usage-guide/index.adoc[] to learn more about tailoring your HDFS configuration to
17
26
your needs, or have a look at the <<demos, demos>> for some example setups.
18
27
19
28
== Operator model
20
29
21
-
The Operator manages the _HdfsCluster_ custom resource. The cluster implements three
22
-
xref:concepts:roles-and-role-groups.adoc[roles]:
30
+
The operator manages the _HdfsCluster_ custom resource.
31
+
The cluster implements three xref:concepts:roles-and-role-groups.adoc[roles]:
23
32
24
33
* DataNode - responsible for storing the actual data.
25
34
* JournalNode - responsible for keeping track of HDFS blocks and used to perform failovers in case the active NameNode
26
35
fails. For details see: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html
27
36
* NameNode - responsible for keeping track of HDFS blocks and providing access to the data.
28
37
29
-
30
-
image::hdfs_overview.drawio.svg[A diagram depicting the Kubernetes resources created by the Stackable Operator for Apache HDFS]
38
+
image::hdfs_overview.drawio.svg[A diagram depicting the Kubernetes resources created by the Stackable operator for Apache HDFS]
31
39
32
40
The operator creates the following K8S objects per role group defined in the custom resource.
33
41
@@ -39,36 +47,41 @@ The operator creates the following K8S objects per role group defined in the cus
39
47
In addition, a `NodePort` service is created for each pod labeled with `hdfs.stackable.tech/pod-service=true` that
40
48
exposes all container ports to the outside world (from the perspective of K8S).
41
49
42
-
In the custom resource you can specify the number of replicas per role group (NameNode, DataNode or JournalNode). A
43
-
minimal working configuration requires:
50
+
In the custom resource you can specify the number of replicas per role group (NameNode, DataNode or JournalNode).
51
+
A minimal working configuration requires:
44
52
45
53
* 2 NameNodes (HA)
46
54
* 1 JournalNode
47
55
* 1 DataNode (should match at least the `clusterConfig.dfsReplication` factor)
48
56
49
-
The Operator creates a xref:concepts:service_discovery.adoc[service discovery ConfigMap] for the HDFS instance. The
50
-
discovery ConfigMap contains the `core-site.xml` file and the `hdfs-site.xml` file.
57
+
The operator creates a xref:concepts:service_discovery.adoc[service discovery ConfigMap] for the HDFS instance.
58
+
The discovery ConfigMap contains the `core-site.xml` file and the `hdfs-site.xml` file.
51
59
52
60
== Dependencies
53
61
54
-
HDFS depends on ZooKeeper for coordination between nodes. You can run a ZooKeeper cluster with the
55
-
xref:zookeeper:index.adoc[]. Additionally, the xref:commons-operator:index.adoc[],
56
-
xref:secret-operator:index.adoc[] and xref:listener-operator:index.adoc[] are needed.
62
+
HDFS depends on Apache ZooKeeper for coordination between nodes.
63
+
You can run a ZooKeeper cluster with the xref:zookeeper:index.adoc[].
64
+
Additionally, the xref:commons-operator:index.adoc[], xref:secret-operator:index.adoc[] and xref:listener-operator:index.adoc[] are required.
57
65
58
66
== [[demos]]Demos
59
67
60
68
Two demos that use HDFS are available.
61
69
62
-
**xref:demos:hbase-hdfs-load-cycling-data.adoc[]** loads a dataset of cycling data from S3 into HDFS and then uses HBase
63
-
to analyze the data.
70
+
**xref:demos:hbase-hdfs-load-cycling-data.adoc[]** loads a dataset of cycling data from S3 into HDFS and then uses HBase to analyze the data.
64
71
65
-
**xref:demos:jupyterhub-pyspark-hdfs-anomaly-detection-taxi-data.adoc[]** showcases the integration between HDFS and
66
-
Jupyter. New York Taxi data is stored in HDFS and analyzed in a Jupyter notebook.
72
+
**xref:demos:jupyterhub-pyspark-hdfs-anomaly-detection-taxi-data.adoc[]** showcases the integration between HDFS and Jupyter.
73
+
New York Taxi data is stored in HDFS and analyzed in a Jupyter notebook.
67
74
68
75
== Supported versions
69
76
70
-
The Stackable Operator for Apache HDFS currently supports the HDFS versions listed below.
77
+
The Stackable operator for Apache HDFS currently supports the HDFS versions listed below.
71
78
To use a specific HDFS version in your HdfsCluster, you have to specify an image - this is explained in the xref:concepts:product-image-selection.adoc[] documentation.
72
79
The operator also supports running images from a custom registry or running entirely customized images; both of these cases are explained under xref:concepts:product-image-selection.adoc[] as well.
73
80
74
81
include::partial$supported-versions.adoc[]
82
+
83
+
== Useful links
84
+
85
+
* The {github}[hdfs-operator {external-link-icon}^] GitHub repository
86
+
* The operator feature overview in the {feature-tracker}[feature tracker {external-link-icon}^]
87
+
* The {crd-hdfscluster}[HdfsCluster {external-link-icon}^] CRD documentation
0 commit comments