Skip to content

docs: Document why we need storage connections #575

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 25, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 13 additions & 6 deletions docs/modules/hive/pages/usage-guide/data-storage.adoc
Original file line number Diff line number Diff line change
@@ -1,11 +1,18 @@
= Data storage backends
:description: Hive supports metadata storage on S3 and HDFS. Configure S3 with S3Connection and HDFS with configMap in clusterConfig.

Hive does not store data, only metadata. It can store metadata about data stored in various places. The Stackable Operator currently supports S3 and HFS.
You can operate the Hive metastore service (HMS) without S3 or HDFS.
Its whole purpose is to store metadata such as "Table foo has columns a, b and c and is stored as parquet in local://tmp/hive/foo".

== [[s3]]S3 support
However, as soon as you start storing metadata in the HMS that refers to a `s3a://` or `hdfs://` locations, HMS will actually do some operations on the filesystem. This can be e.g. checking if the table location exists, creating it in case it is missing.

Hive supports creating tables in S3 compatible object stores.
So if you are storing tables in S3 (or HDFS for that matter), you need to give the HMS access to that filesystem as well.
The Stackable Operator currently supports S3 and HFS.

[s3]
== S3 support

HMS supports creating tables in S3 compatible object stores.
To use this feature you need to provide connection details for the object store using the xref:concepts:s3.adoc[S3Connection] in the top level `clusterConfig`.

An example usage can look like this:
Expand All @@ -22,10 +29,10 @@ clusterConfig:
secretClass: simple-hive-s3-secret-class
----

[hdfs]
== Apache HDFS support

== [[hdfs]]Apache HDFS support

As well as S3, Hive also supports creating tables in HDFS.
As well as S3, HMS also supports creating tables in HDFS.
You can add the HDFS connection in the top level `clusterConfig` as follows:

[source,yaml]
Expand Down