From 56911c0fbc2756f487f6b66edeee4dee19e4b2a8 Mon Sep 17 00:00:00 2001 From: Sebastian Bernauer Date: Tue, 25 Feb 2025 09:44:49 +0100 Subject: [PATCH 1/2] docs: Document why we need storage connections --- .../hive/pages/usage-guide/data-storage.adoc | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/docs/modules/hive/pages/usage-guide/data-storage.adoc b/docs/modules/hive/pages/usage-guide/data-storage.adoc index eb8a2b92..ef6e160f 100644 --- a/docs/modules/hive/pages/usage-guide/data-storage.adoc +++ b/docs/modules/hive/pages/usage-guide/data-storage.adoc @@ -1,11 +1,18 @@ = Data storage backends :description: Hive supports metadata storage on S3 and HDFS. Configure S3 with S3Connection and HDFS with configMap in clusterConfig. -Hive does not store data, only metadata. It can store metadata about data stored in various places. The Stackable Operator currently supports S3 and HFS. +You can operate the Hive metastore service (HMS) without S3 or HDFS. +It whole purpose is to store metadata such as "Table foo has columns a, b and c and is stored as parquet in local://tmp/hive/foo". -== [[s3]]S3 support +However, as soon as you start storing metadata in the HMS that refers to a `s3a://` or `hdfs://` locations, HMS will actually do some operations on the filesystem. This can be e.g. checking if the table location exists, creating it in case it is missing. -Hive supports creating tables in S3 compatible object stores. +So if you are storing tables in S3 (or HDFS for that matter), you need to give HMS to that filesystem as well. +The Stackable Operator currently supports S3 and HFS. + +[s3] +== S3 support + +HMS supports creating tables in S3 compatible object stores. To use this feature you need to provide connection details for the object store using the xref:concepts:s3.adoc[S3Connection] in the top level `clusterConfig`. An example usage can look like this: @@ -22,10 +29,10 @@ clusterConfig: secretClass: simple-hive-s3-secret-class ---- +[hdfs] +== Apache HDFS support -== [[hdfs]]Apache HDFS support - -As well as S3, Hive also supports creating tables in HDFS. +As well as S3, HMS also supports creating tables in HDFS. You can add the HDFS connection in the top level `clusterConfig` as follows: [source,yaml] From 2fdbee6bff6b18db302964913f1f52cdf588a5c5 Mon Sep 17 00:00:00 2001 From: Sebastian Bernauer Date: Tue, 25 Feb 2025 10:16:27 +0100 Subject: [PATCH 2/2] Apply suggestions from code review Co-authored-by: Malte Sander --- docs/modules/hive/pages/usage-guide/data-storage.adoc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/modules/hive/pages/usage-guide/data-storage.adoc b/docs/modules/hive/pages/usage-guide/data-storage.adoc index ef6e160f..9b8be69b 100644 --- a/docs/modules/hive/pages/usage-guide/data-storage.adoc +++ b/docs/modules/hive/pages/usage-guide/data-storage.adoc @@ -2,11 +2,11 @@ :description: Hive supports metadata storage on S3 and HDFS. Configure S3 with S3Connection and HDFS with configMap in clusterConfig. You can operate the Hive metastore service (HMS) without S3 or HDFS. -It whole purpose is to store metadata such as "Table foo has columns a, b and c and is stored as parquet in local://tmp/hive/foo". +Its whole purpose is to store metadata such as "Table foo has columns a, b and c and is stored as parquet in local://tmp/hive/foo". However, as soon as you start storing metadata in the HMS that refers to a `s3a://` or `hdfs://` locations, HMS will actually do some operations on the filesystem. This can be e.g. checking if the table location exists, creating it in case it is missing. -So if you are storing tables in S3 (or HDFS for that matter), you need to give HMS to that filesystem as well. +So if you are storing tables in S3 (or HDFS for that matter), you need to give the HMS access to that filesystem as well. The Stackable Operator currently supports S3 and HFS. [s3]