-
Notifications
You must be signed in to change notification settings - Fork 354
Description
Description
We aware that most amoro-related properties should not have influence on underlying formats when independently using them. This idea makes amoro more flexible and pluggable. Thus, amoro catalog-level default properties are designed and implemented to be merged on loading instead of written into table properties directly. Whereas, the mode 'merge on loading' has its own drawbacks in some case. For instance, it might has more than one logstore cluster in a company. At very beginning, platform maintainers configure 'log-store.address' as a catalog-level default key to indicate a default log-store cluster. Users can directly create mix table without awaring log-store infrastructure infos and everything is happy. However, after a few time passed the only one cluster has bottleneck and thus platform needs a new additional log-store cluster. Platform maintainer cannot simply configure the 'log-store.address' again because 'merge on loading' implementation would change log-store address for old mix tables. As we can seen, some special amoro properties should been bind to tables(write into underly table properties) especially storage-related properties(such as log-store\table compression codec).
Specifically, these factors are considered whether a property is written(persist) into underlying table meta.
- underlying format own configuration keys.
- storage-related meta data keys (e.g log-store.xxx, compression.codec)
The properties with followed properties and prefix are not been written into underlying table meta(a blacklist) by default, they act as 'merge on loading' properties:
// amoro service related
- self-optimizing.
- optimize.
- table-expire.
- clean-orphan-file.
- clean-dangling-delete-files.
- data-expire.
- table-trash.
- tag.auto-create.
// read/write related
- read.split.open-file-cost
- read.split.planning-lookback
- read.split.target-size
- read.split.delete-ratio // removed, never be used in code
- write.upsert.enabled
// mix-hive related
- base.hive.auto-sync-schema-change
- base.hive.auto-sync-data-write
- base.hive.consistent-write.enabled
Furthermore, we also provided ways for users to configure black list and white list for their own.
table-properties.non-persisted.additional // a (semicolon-separated) list of property names(or prefix) that would not write into('merge on loading') underlying tables in addition to default names(or prefix)
table-properties.non-persisted.excluded // a (semicolon-separated) list of property names(or prefix) excluded from default 'merge on loading' properties, they can been written into table properties
Limitation
This feature is only valid when using spark\flink unified catalog implementation to create table.
It is also only testified by iceberg\ mixed format.
Use case/motivation
No response
Describe the solution
Refer to description above, merge configed keys(list above) that should be written into table metadata when creating a table in a unified catalog implementation
Subtasks
No response
Related issues
No response
Are you willing to submit a PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct