Skip to content

[Feature]: Enable catalog-level default properties storing into underlying tables within a unified catalog #3470

@nicochen

Description

@nicochen

Description

We aware that most amoro-related properties should not have influence on underlying formats when independently using them. This idea makes amoro more flexible and pluggable. Thus, amoro catalog-level default properties are designed and implemented to be merged on loading instead of written into table properties directly. Whereas, the mode 'merge on loading' has its own drawbacks in some case. For instance, it might has more than one logstore cluster in a company. At very beginning, platform maintainers configure 'log-store.address' as a catalog-level default key to indicate a default log-store cluster. Users can directly create mix table without awaring log-store infrastructure infos and everything is happy. However, after a few time passed the only one cluster has bottleneck and thus platform needs a new additional log-store cluster. Platform maintainer cannot simply configure the 'log-store.address' again because 'merge on loading' implementation would change log-store address for old mix tables. As we can seen, some special amoro properties should been bind to tables(write into underly table properties) especially storage-related properties(such as log-store\table compression codec).

Specifically, these factors are considered whether a property is written(persist) into underlying table meta.

  1. underlying format own configuration keys.
  2. storage-related meta data keys (e.g log-store.xxx, compression.codec)

The properties with followed properties and prefix are not been written into underlying table meta(a blacklist) by default, they act as 'merge on loading' properties:

// amoro service related

  • self-optimizing.
  • optimize.
  • table-expire.
  • clean-orphan-file.
  • clean-dangling-delete-files.
  • data-expire.
  • table-trash.
  • tag.auto-create.

// read/write related

  • read.split.open-file-cost
  • read.split.planning-lookback
  • read.split.target-size
  • read.split.delete-ratio // removed, never be used in code
  • write.upsert.enabled

// mix-hive related

  • base.hive.auto-sync-schema-change
  • base.hive.auto-sync-data-write
  • base.hive.consistent-write.enabled

Furthermore, we also provided ways for users to configure black list and white list for their own.

table-properties.non-persisted.additional  // a (semicolon-separated) list of property names(or prefix) that would not write into('merge on loading') underlying tables in addition to default names(or prefix)
table-properties.non-persisted.excluded // a (semicolon-separated) list of property names(or prefix) excluded from default 'merge on loading' properties, they can been written into table properties 

Limitation

This feature is only valid when using spark\flink unified catalog implementation to create table.
It is also only testified by iceberg\ mixed format.

Use case/motivation

No response

Describe the solution

Refer to description above, merge configed keys(list above) that should be written into table metadata when creating a table in a unified catalog implementation

Subtasks

No response

Related issues

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions