Skip to content

[Iceberg] - No cleanup of invalid tables when encoutering PAGE_TRANSPORT_TIMEOUT #26579

@nerstak

Description

@nerstak

Trino Version : 476
Connector: Iceberg
File System Used: AWS S3, with s3a protocol

Connector Configuration:

connector.name=iceberg
iceberg.register-table-procedure.enabled=true
iceberg.file-format=PARQUET
iceberg.max-partitions-per-writer=10000
fs.native-s3.enabled=true
s3.path-style-access=true
s3.max-connections=1000
hive.metastore.uri=thrift://HOST:PORT
hive.metastore.thrift.client.connect-timeout=30s
hive.metastore.thrift.client.read-timeout=30s

When encountering a PAGE_TRANSPORT_TIMEOUT while a CREATE TABLE query is running on an Iceberg connector, Trino will drop and stop all running queries (as expected).
My concerns is that it will not cleanup data created by the CREATE TABLE queries, leading to orphan files, not tracked by Trino and Iceberg. As far I know, there is no cleanup procedures for these unfinished tables.

This is an issue as it may lead to increasing storage cost, while being untrackable by Iceberg.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions