Skip to content

Issues with permissions using DataLakeCatalog with type=glue #954

@alsugiliazova

Description

@alsugiliazova

Scenario
I have two different tables in two different databases in two different buckets:

  1. I have two databases: clickhouse-regression-database-1 and clickhouse-regression-database-2
  2. I created table_a under clickhouse-regression-database-1, stored in
  • s3://glue-bucket-db-a/table_one
  1. I created table_b under clickhouse-regression-database-2, stored in
  • s3://glue-bucket-db-b/second_table

Let's consider different permission configurations of IAM user (this will be Policy_1):

{
	"Version": "2012-10-17",
	"Statement": [
		{
			"Sid": "GlueAccessToSpecificTable",
			"Effect": "Allow",
			"Action": [
				"glue:GetTable",
				"glue:GetTables",
				"glue:GetDatabase",
				"glue:GetDatabases",
				"glue:GetTableVersion",
				"glue:GetTableVersions"
			],
			"Resource": "*"
		},
		{
			"Sid": "S3AccessToIcebergTableData",
			"Effect": "Allow",
			"Action": [
				"s3:ListBucket",
				"s3:GetObject",
				"s3:GetBucketLocation"
			],
			"Resource": [
				"arn:aws:s3:::glue-bucket-db-a",
				"arn:aws:s3:::glue-bucket-db-a/table_one/*"
			]
		}
	]
}

What this policy means:

  • The IAM user has full read access to Glue metadata for all databases and tables.
  • The user has S3 access only to objects inside s3://glue-bucket-db-a/table_one/.
  • The user does not have S3 access to glue-bucket-db-b (used by table_b).

From clickhouse (25.6.6.29):

SET allow_experimental_database_glue_catalog = 1;
CREATE DATABASE glue
ENGINE = DataLakeCatalog
SETTINGS catalog_type = 'glue', region = 'eu-central-1', aws_access_key_id = '***', aws_secret_access_key = 'Z***';
SHOW TABLES FROM glue;

Both tables are visible from the Glue catalog, even though the IAM user doesn't have S3 access to one of them.


   ┌─name─────────────────────────────────────┐
1. │ clickhouse-regression-database-1.table_a │
2. │ clickhouse-regression-database-2.table_b │
   └──────────────────────────────────────────┘

Let's try to select from those tables:

SELECT *
FROM glue.`clickhouse-regression-database-1.table_a`
   ┌─string─┬─int─┬─binary─┐
1. │ aa     │   1 │ ᴺᵁᴸᴸ   │
2. │ ᴺᵁᴸᴸ   │   1 │ ᴺᵁᴸᴸ   │
3. │ aaaaa  │   1 │ ᴺᵁᴸᴸ   │
   └────────┴─────┴────────┘

3 rows in set. Elapsed: 0.463 sec. 
SELECT *
FROM glue.`clickhouse-regression-database-2.table_b`
Elapsed: 0.351 sec. 

Received exception from server (version 25.6.6):
Code: 499. DB::Exception: Received from localhost:9000. DB::Exception: User: arn:aws:iam::***:user/agiliazova is not authorized to perform: s3:GetObject on resource: "arn:aws:s3:::glue-bucket-db-b/second_table/metadata/00001-2898634d-aa90-44f5-b5bc-7e067f43663f.metadata.json" because no identity-based policy allows the s3:GetObject action: while reading key: second_table/metadata/00001-2898634d-aa90-44f5-b5bc-7e067f43663f.metadata.json, from bucket: glue-bucket-db-b. (S3_ERROR)

SHOW CREATE TABLE works for both tables:

SHOW CREATE TABLE glue.`clickhouse-regression-database-1.table_a`
   ┌─statement────────────────────────────────────────────────────┐
1. │ CREATE TABLE glue.`clickhouse-regression-database-1.table_a`↴│
   │↳(                                                           ↴│
   │↳    `string` Nullable(String),                              ↴│
   │↳    `int` Nullable(Int32),                                  ↴│
   │↳    `binary` Nullable(String)                               ↴│
   │↳)                                                           ↴│
   │↳ENGINE = Iceberg('s3://glue-bucket-db-a/table_one')          │
   └──────────────────────────────────────────────────────────────┘
SHOW CREATE TABLE glue.`clickhouse-regression-database-2.table_b`
   ┌─statement────────────────────────────────────────────────────┐
1. │ CREATE TABLE glue.`clickhouse-regression-database-2.table_b`↴│
   │↳(                                                           ↴│
   │↳    `int` Nullable(Int32),                                  ↴│
   │↳    `string` Nullable(String),                              ↴│
   │↳    `float` Nullable(Float32)                               ↴│
   │↳)                                                           ↴│
   │↳ENGINE = Iceberg('s3://glue-bucket-db-b/second_table')       │
   └──────────────────────────────────────────────────────────────┘

Now I update policy, since I want to be able to see only table_a from clickhouse (this will be Policy_2):

{
	"Version": "2012-10-17",
	"Statement": [
		{
			"Sid": "GlueAccessToSpecificTable",
			"Effect": "Allow",
			"Action": [
				"glue:GetTable",
				"glue:GetTables",
				"glue:GetDatabase",
				"glue:GetDatabases",
				"glue:GetTableVersion",
				"glue:GetTableVersions"
			],
			"Resource": [
				"arn:aws:glue:eu-central-1:{user id}:catalog",
				"arn:aws:glue:eu-central-1:{user id}:database/clickhouse-regression-database-1",
				"arn:aws:glue:eu-central-1:{user id}:table/clickhouse-regression-database-1/table_a"
			]
		},
		{
			"Sid": "S3AccessToIcebergTableData",
			"Effect": "Allow",
			"Action": [
				"s3:ListBucket",
				"s3:GetObject",
				"s3:GetBucketLocation"
			],
			"Resource": [
				"arn:aws:s3:::glue-bucket-db-a",
				"arn:aws:s3:::glue-bucket-db-a/table_one/*"
			]
		}
	]
}

In Policy_2, Glue access is restricted to specific resources:

"Resource": [
				"arn:aws:glue:eu-central-1:{user id}:catalog",
				"arn:aws:glue:eu-central-1:{user id}:database/clickhouse-regression-database-1",
				"arn:aws:glue:eu-central-1:{user id}:table/clickhouse-regression-database-1/table_a"
			]

IAM user can only:

  • Access the Glue catalog
  • See only the specific database clickhouse-regression-database-1
  • See only the specific table table_a

Again from clickhouse (25.6.6.29):

SHOW TABLES FROM glue
   ┌─name─────────────────────────────────────┐
1. │ clickhouse-regression-database-1.table_a │
   └──────────────────────────────────────────┘

1 row in set. Elapsed: 0.955 sec. 

Now as expected I see only one table.

I can select from it:

SELECT *
FROM glue.`clickhouse-regression-database-1.table_a`
   ┌─string─┬─int─┬─binary─┐
1. │ aa     │   1 │ ᴺᵁᴸᴸ   │
2. │ ᴺᵁᴸᴸ   │   1 │ ᴺᵁᴸᴸ   │
3. │ aaaaa  │   1 │ ᴺᵁᴸᴸ   │
   └────────┴─────┴────────┘

3 rows in set. Elapsed: 0.460 sec. 

I can not select from table_b (which is expected):

SELECT *
FROM glue.`clickhouse-regression-database-2.table_b`
Received exception from server (version 25.6.6):
Code: 736. DB::Exception: Received from localhost:9000. DB::Exception: Exception calling GetTable for table clickhouse-regression-database-2.table_b: User: arn:aws:iam::{user id}:user/agiliazova is not authorized to perform: glue:GetTable on resource: arn:aws:glue:eu-central-1:{user id}:database/clickhouse-regression-database-2 because no identity-based policy allows the glue:GetTable action. (DATALAKE_DATABASE_ERROR)

SHOW CREATE TABLE works only for table_a.

Now let's go back to Policy_1 and run all the same queries on altinity/clickhouse-server:25.3.3.20186.altinityantalya:

SET allow_experimental_database_glue_catalog = 1;
CREATE DATABASE glue
ENGINE = DataLakeCatalog
SETTINGS catalog_type = 'glue', region = 'eu-central-1', aws_access_key_id = '***', aws_secret_access_key = '***';
SHOW TABLES FROM glue
Received exception from server (version 25.3.3):
Code: 499. DB::Exception: Received from localhost:9000. DB::Exception: User: arn:aws:iam::{user id}:user/agiliazova is not authorized to perform: s3:GetObject on resource: "arn:aws:s3:::glue-bucket-db-b/second_table/metadata/00001-2898634d-aa90-44f5-b5bc-7e067f43663f.metadata.json" because no identity-based policy allows the s3:GetObject action: while reading key: second_table/metadata/00001-2898634d-aa90-44f5-b5bc-7e067f43663f.metadata.json, from bucket: glue-bucket-db-b. (S3_ERROR)

This shows that in ClickHouse version 25.3.3.altinityantalya, SHOW TABLES attempts to read Iceberg metadata (*.metadata.json) from S3 even during table listing.

As a result, if the IAM user does not have s3:GetObject access to all buckets, the SHOW TABLES command itself fails, even if the user is not querying the problematic table.

But I am still able to query table_a:

SELECT *
FROM glue.`clickhouse-regression-database-1.table_a`
   ┌─string─┬─int─┬─binary─┐
1. │ aa     │   1 │ ᴺᵁᴸᴸ   │
2. │ ᴺᵁᴸᴸ   │   1 │ ᴺᵁᴸᴸ   │
3. │ aaaaa  │   1 │ ᴺᵁᴸᴸ   │
   └────────┴─────┴────────┘

3 rows in set. Elapsed: 0.297 sec. 

I am not able to query table_b:

SELECT *
FROM glue.`clickhouse-regression-database-2.table_b`
Received exception from server (version 25.3.3):
Code: 499. DB::Exception: Received from localhost:9000. DB::Exception: User: arn:aws:iam::{user id}:user/agiliazova is not authorized to perform: s3:GetObject on resource: "arn:aws:s3:::glue-bucket-db-b/second_table/metadata/00001-2898634d-aa90-44f5-b5bc-7e067f43663f.metadata.json" because no identity-based policy allows the s3:GetObject action: while reading key: second_table/metadata/00001-2898634d-aa90-44f5-b5bc-7e067f43663f.metadata.json, from bucket: glue-bucket-db-b. (S3_ERROR)

But surprisingly SHOW CREATE TABLE works for both tables:

SHOW CREATE TABLE glue.`clickhouse-regression-database-1.table_a`
   ┌─statement────────────────────────────────────────────────────┐
1. │ CREATE TABLE glue.`clickhouse-regression-database-1.table_a`↴│
   │↳(                                                           ↴│
   │↳    `string` Nullable(String),                              ↴│
   │↳    `int` Nullable(Int32),                                  ↴│
   │↳    `binary` Nullable(String)                               ↴│
   │↳)                                                           ↴│
   │↳ENGINE = Iceberg('s3://glue-bucket-db-a/table_one')          │
   └──────────────────────────────────────────────────────────────┘

1 row in set. Elapsed: 0.106 sec. 
SHOW CREATE TABLE glue.`clickhouse-regression-database-2.table_b`
   ┌─statement────────────────────────────────────────────────────┐
1. │ CREATE TABLE glue.`clickhouse-regression-database-2.table_b`↴│
   │↳(                                                           ↴│
   │↳    `int` Nullable(Int32),                                  ↴│
   │↳    `string` Nullable(String),                              ↴│
   │↳    `float` Nullable(Float32)                               ↴│
   │↳)                                                           ↴│
   │↳ENGINE = Iceberg('s3://glue-bucket-db-b/second_table')       │
   └──────────────────────────────────────────────────────────────┘

1 row in set. Elapsed: 0.172 sec. 

Not let's switch to Policy_2 and run all above queries again:

SHOW TABLES FROM glue
   ┌─name─────────────────────────────────────┐
1. │ clickhouse-regression-database-1.table_a │
   └──────────────────────────────────────────┘

1 row in set. Elapsed: 0.880 sec. 
SELECT *
FROM glue.`clickhouse-regression-database-1.table_a`
   ┌─string─┬─int─┬─binary─┐
1. │ aa     │   1 │ ᴺᵁᴸᴸ   │
2. │ aaaaa  │   1 │ ᴺᵁᴸᴸ   │
3. │ ᴺᵁᴸᴸ   │   1 │ ᴺᵁᴸᴸ   │
   └────────┴─────┴────────┘

3 rows in set. Elapsed: 0.444 sec. 
SELECT *
FROM glue.`clickhouse-regression-database-2.table_b`
Received exception from server (version 25.3.3):
Code: 736. DB::Exception: Received from localhost:9000. DB::Exception: Exception calling GetTable for table clickhouse-regression-database-2.table_b: User: arn:aws:iam::{user id}:user/agiliazova is not authorized to perform: glue:GetTable on resource: arn:aws:glue:eu-central-1:{user id}:database/clickhouse-regression-database-2 because no identity-based policy allows the glue:GetTable action. (DATALAKE_DATABASE_ERROR)
SHOW CREATE TABLE glue.`clickhouse-regression-database-1.table_a`
   ┌─statement────────────────────────────────────────────────────┐
1. │ CREATE TABLE glue.`clickhouse-regression-database-1.table_a`↴│
   │↳(                                                           ↴│
   │↳    `string` Nullable(String),                              ↴│
   │↳    `int` Nullable(Int32),                                  ↴│
   │↳    `binary` Nullable(String)                               ↴│
   │↳)                                                           ↴│
   │↳ENGINE = Iceberg('s3://glue-bucket-db-a/table_one')          │
   └──────────────────────────────────────────────────────────────┘

1 row in set. Elapsed: 0.140 sec. 
SHOW CREATE TABLE glue.`clickhouse-regression-database-2.table_b`
Received exception from server (version 25.3.3):
Code: 736. DB::Exception: Received from localhost:9000. DB::Exception: Exception calling GetTable for table clickhouse-regression-database-2.table_b: User: arn:aws:iam::{user id}:user/agiliazova is not authorized to perform: glue:GetTable on resource: arn:aws:glue:eu-central-1:{user id}:database/clickhouse-regression-database-2 because no identity-based policy allows the glue:GetTable action. (DATALAKE_DATABASE_ERROR)

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions