-
Notifications
You must be signed in to change notification settings - Fork 8
Description
Scenario
I have two different tables in two different databases in two different buckets:
- I have two databases:
clickhouse-regression-database-1
andclickhouse-regression-database-2
- I created table_a under
clickhouse-regression-database-1
, stored in
s3://glue-bucket-db-a/table_one
- I created table_b under
clickhouse-regression-database-2
, stored in
s3://glue-bucket-db-b/second_table
Let's consider different permission configurations of IAM user (this will be Policy_1):
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "GlueAccessToSpecificTable",
"Effect": "Allow",
"Action": [
"glue:GetTable",
"glue:GetTables",
"glue:GetDatabase",
"glue:GetDatabases",
"glue:GetTableVersion",
"glue:GetTableVersions"
],
"Resource": "*"
},
{
"Sid": "S3AccessToIcebergTableData",
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetObject",
"s3:GetBucketLocation"
],
"Resource": [
"arn:aws:s3:::glue-bucket-db-a",
"arn:aws:s3:::glue-bucket-db-a/table_one/*"
]
}
]
}
What this policy means:
- The IAM user has full read access to Glue metadata for all databases and tables.
- The user has S3 access only to objects inside s3://glue-bucket-db-a/table_one/.
- The user does not have S3 access to glue-bucket-db-b (used by table_b).
From clickhouse (25.6.6.29):
SET allow_experimental_database_glue_catalog = 1;
CREATE DATABASE glue
ENGINE = DataLakeCatalog
SETTINGS catalog_type = 'glue', region = 'eu-central-1', aws_access_key_id = '***', aws_secret_access_key = 'Z***';
SHOW TABLES FROM glue;
Both tables are visible from the Glue catalog, even though the IAM user doesn't have S3 access to one of them.
┌─name─────────────────────────────────────┐
1. │ clickhouse-regression-database-1.table_a │
2. │ clickhouse-regression-database-2.table_b │
└──────────────────────────────────────────┘
Let's try to select from those tables:
SELECT *
FROM glue.`clickhouse-regression-database-1.table_a`
┌─string─┬─int─┬─binary─┐
1. │ aa │ 1 │ ᴺᵁᴸᴸ │
2. │ ᴺᵁᴸᴸ │ 1 │ ᴺᵁᴸᴸ │
3. │ aaaaa │ 1 │ ᴺᵁᴸᴸ │
└────────┴─────┴────────┘
3 rows in set. Elapsed: 0.463 sec.
SELECT *
FROM glue.`clickhouse-regression-database-2.table_b`
Elapsed: 0.351 sec.
Received exception from server (version 25.6.6):
Code: 499. DB::Exception: Received from localhost:9000. DB::Exception: User: arn:aws:iam::***:user/agiliazova is not authorized to perform: s3:GetObject on resource: "arn:aws:s3:::glue-bucket-db-b/second_table/metadata/00001-2898634d-aa90-44f5-b5bc-7e067f43663f.metadata.json" because no identity-based policy allows the s3:GetObject action: while reading key: second_table/metadata/00001-2898634d-aa90-44f5-b5bc-7e067f43663f.metadata.json, from bucket: glue-bucket-db-b. (S3_ERROR)
SHOW CREATE TABLE
works for both tables:
SHOW CREATE TABLE glue.`clickhouse-regression-database-1.table_a`
┌─statement────────────────────────────────────────────────────┐
1. │ CREATE TABLE glue.`clickhouse-regression-database-1.table_a`↴│
│↳( ↴│
│↳ `string` Nullable(String), ↴│
│↳ `int` Nullable(Int32), ↴│
│↳ `binary` Nullable(String) ↴│
│↳) ↴│
│↳ENGINE = Iceberg('s3://glue-bucket-db-a/table_one') │
└──────────────────────────────────────────────────────────────┘
SHOW CREATE TABLE glue.`clickhouse-regression-database-2.table_b`
┌─statement────────────────────────────────────────────────────┐
1. │ CREATE TABLE glue.`clickhouse-regression-database-2.table_b`↴│
│↳( ↴│
│↳ `int` Nullable(Int32), ↴│
│↳ `string` Nullable(String), ↴│
│↳ `float` Nullable(Float32) ↴│
│↳) ↴│
│↳ENGINE = Iceberg('s3://glue-bucket-db-b/second_table') │
└──────────────────────────────────────────────────────────────┘
Now I update policy, since I want to be able to see only table_a
from clickhouse (this will be Policy_2):
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "GlueAccessToSpecificTable",
"Effect": "Allow",
"Action": [
"glue:GetTable",
"glue:GetTables",
"glue:GetDatabase",
"glue:GetDatabases",
"glue:GetTableVersion",
"glue:GetTableVersions"
],
"Resource": [
"arn:aws:glue:eu-central-1:{user id}:catalog",
"arn:aws:glue:eu-central-1:{user id}:database/clickhouse-regression-database-1",
"arn:aws:glue:eu-central-1:{user id}:table/clickhouse-regression-database-1/table_a"
]
},
{
"Sid": "S3AccessToIcebergTableData",
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:GetObject",
"s3:GetBucketLocation"
],
"Resource": [
"arn:aws:s3:::glue-bucket-db-a",
"arn:aws:s3:::glue-bucket-db-a/table_one/*"
]
}
]
}
In Policy_2, Glue access is restricted to specific resources:
"Resource": [
"arn:aws:glue:eu-central-1:{user id}:catalog",
"arn:aws:glue:eu-central-1:{user id}:database/clickhouse-regression-database-1",
"arn:aws:glue:eu-central-1:{user id}:table/clickhouse-regression-database-1/table_a"
]
IAM user can only:
- Access the Glue catalog
- See only the specific database
clickhouse-regression-database-1
- See only the specific table table_a
Again from clickhouse (25.6.6.29):
SHOW TABLES FROM glue
┌─name─────────────────────────────────────┐
1. │ clickhouse-regression-database-1.table_a │
└──────────────────────────────────────────┘
1 row in set. Elapsed: 0.955 sec.
Now as expected I see only one table.
I can select from it:
SELECT *
FROM glue.`clickhouse-regression-database-1.table_a`
┌─string─┬─int─┬─binary─┐
1. │ aa │ 1 │ ᴺᵁᴸᴸ │
2. │ ᴺᵁᴸᴸ │ 1 │ ᴺᵁᴸᴸ │
3. │ aaaaa │ 1 │ ᴺᵁᴸᴸ │
└────────┴─────┴────────┘
3 rows in set. Elapsed: 0.460 sec.
I can not select from table_b (which is expected):
SELECT *
FROM glue.`clickhouse-regression-database-2.table_b`
Received exception from server (version 25.6.6):
Code: 736. DB::Exception: Received from localhost:9000. DB::Exception: Exception calling GetTable for table clickhouse-regression-database-2.table_b: User: arn:aws:iam::{user id}:user/agiliazova is not authorized to perform: glue:GetTable on resource: arn:aws:glue:eu-central-1:{user id}:database/clickhouse-regression-database-2 because no identity-based policy allows the glue:GetTable action. (DATALAKE_DATABASE_ERROR)
SHOW CREATE TABLE
works only for table_a
.
Now let's go back to Policy_1 and run all the same queries on altinity/clickhouse-server:25.3.3.20186.altinityantalya
:
SET allow_experimental_database_glue_catalog = 1;
CREATE DATABASE glue
ENGINE = DataLakeCatalog
SETTINGS catalog_type = 'glue', region = 'eu-central-1', aws_access_key_id = '***', aws_secret_access_key = '***';
SHOW TABLES FROM glue
Received exception from server (version 25.3.3):
Code: 499. DB::Exception: Received from localhost:9000. DB::Exception: User: arn:aws:iam::{user id}:user/agiliazova is not authorized to perform: s3:GetObject on resource: "arn:aws:s3:::glue-bucket-db-b/second_table/metadata/00001-2898634d-aa90-44f5-b5bc-7e067f43663f.metadata.json" because no identity-based policy allows the s3:GetObject action: while reading key: second_table/metadata/00001-2898634d-aa90-44f5-b5bc-7e067f43663f.metadata.json, from bucket: glue-bucket-db-b. (S3_ERROR)
This shows that in ClickHouse version 25.3.3.altinityantalya, SHOW TABLES
attempts to read Iceberg metadata (*.metadata.json) from S3 even during table listing.
As a result, if the IAM user does not have s3:GetObject access to all buckets, the SHOW TABLES command itself fails, even if the user is not querying the problematic table.
But I am still able to query table_a:
SELECT *
FROM glue.`clickhouse-regression-database-1.table_a`
┌─string─┬─int─┬─binary─┐
1. │ aa │ 1 │ ᴺᵁᴸᴸ │
2. │ ᴺᵁᴸᴸ │ 1 │ ᴺᵁᴸᴸ │
3. │ aaaaa │ 1 │ ᴺᵁᴸᴸ │
└────────┴─────┴────────┘
3 rows in set. Elapsed: 0.297 sec.
I am not able to query table_b:
SELECT *
FROM glue.`clickhouse-regression-database-2.table_b`
Received exception from server (version 25.3.3):
Code: 499. DB::Exception: Received from localhost:9000. DB::Exception: User: arn:aws:iam::{user id}:user/agiliazova is not authorized to perform: s3:GetObject on resource: "arn:aws:s3:::glue-bucket-db-b/second_table/metadata/00001-2898634d-aa90-44f5-b5bc-7e067f43663f.metadata.json" because no identity-based policy allows the s3:GetObject action: while reading key: second_table/metadata/00001-2898634d-aa90-44f5-b5bc-7e067f43663f.metadata.json, from bucket: glue-bucket-db-b. (S3_ERROR)
But surprisingly SHOW CREATE TABLE
works for both tables:
SHOW CREATE TABLE glue.`clickhouse-regression-database-1.table_a`
┌─statement────────────────────────────────────────────────────┐
1. │ CREATE TABLE glue.`clickhouse-regression-database-1.table_a`↴│
│↳( ↴│
│↳ `string` Nullable(String), ↴│
│↳ `int` Nullable(Int32), ↴│
│↳ `binary` Nullable(String) ↴│
│↳) ↴│
│↳ENGINE = Iceberg('s3://glue-bucket-db-a/table_one') │
└──────────────────────────────────────────────────────────────┘
1 row in set. Elapsed: 0.106 sec.
SHOW CREATE TABLE glue.`clickhouse-regression-database-2.table_b`
┌─statement────────────────────────────────────────────────────┐
1. │ CREATE TABLE glue.`clickhouse-regression-database-2.table_b`↴│
│↳( ↴│
│↳ `int` Nullable(Int32), ↴│
│↳ `string` Nullable(String), ↴│
│↳ `float` Nullable(Float32) ↴│
│↳) ↴│
│↳ENGINE = Iceberg('s3://glue-bucket-db-b/second_table') │
└──────────────────────────────────────────────────────────────┘
1 row in set. Elapsed: 0.172 sec.
Not let's switch to Policy_2 and run all above queries again:
SHOW TABLES FROM glue
┌─name─────────────────────────────────────┐
1. │ clickhouse-regression-database-1.table_a │
└──────────────────────────────────────────┘
1 row in set. Elapsed: 0.880 sec.
SELECT *
FROM glue.`clickhouse-regression-database-1.table_a`
┌─string─┬─int─┬─binary─┐
1. │ aa │ 1 │ ᴺᵁᴸᴸ │
2. │ aaaaa │ 1 │ ᴺᵁᴸᴸ │
3. │ ᴺᵁᴸᴸ │ 1 │ ᴺᵁᴸᴸ │
└────────┴─────┴────────┘
3 rows in set. Elapsed: 0.444 sec.
SELECT *
FROM glue.`clickhouse-regression-database-2.table_b`
Received exception from server (version 25.3.3):
Code: 736. DB::Exception: Received from localhost:9000. DB::Exception: Exception calling GetTable for table clickhouse-regression-database-2.table_b: User: arn:aws:iam::{user id}:user/agiliazova is not authorized to perform: glue:GetTable on resource: arn:aws:glue:eu-central-1:{user id}:database/clickhouse-regression-database-2 because no identity-based policy allows the glue:GetTable action. (DATALAKE_DATABASE_ERROR)
SHOW CREATE TABLE glue.`clickhouse-regression-database-1.table_a`
┌─statement────────────────────────────────────────────────────┐
1. │ CREATE TABLE glue.`clickhouse-regression-database-1.table_a`↴│
│↳( ↴│
│↳ `string` Nullable(String), ↴│
│↳ `int` Nullable(Int32), ↴│
│↳ `binary` Nullable(String) ↴│
│↳) ↴│
│↳ENGINE = Iceberg('s3://glue-bucket-db-a/table_one') │
└──────────────────────────────────────────────────────────────┘
1 row in set. Elapsed: 0.140 sec.
SHOW CREATE TABLE glue.`clickhouse-regression-database-2.table_b`
Received exception from server (version 25.3.3):
Code: 736. DB::Exception: Received from localhost:9000. DB::Exception: Exception calling GetTable for table clickhouse-regression-database-2.table_b: User: arn:aws:iam::{user id}:user/agiliazova is not authorized to perform: glue:GetTable on resource: arn:aws:glue:eu-central-1:{user id}:database/clickhouse-regression-database-2 because no identity-based policy allows the glue:GetTable action. (DATALAKE_DATABASE_ERROR)