Skip to content

About etag in filesystem cache for ObjectStorage #960

@ianton-ru

Description

@ianton-ru

Now ClickHouse uses HeadObject request inside S3::getObjectInfo method to get some metadata about S3 objects.
When we work with DataLake catalog, some info like size can be extracted from catalog.
But some info not.
Object Storage returns etag (https://docs.aws.amazon.com/AmazonS3/latest/API/API_Object.html), which is used to work with filesystem cache.
https://github.com/ClickHouse/ClickHouse/blob/master/src/Storages/ObjectStorage/StorageObjectStorageSource.cpp#L710
It is possible to get etag for multiple objects from ListObject request, when possible.
https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjects.html
ListObject does not work with S3Tables.
May be possible to use some other field instead of etag in filecache, research required.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions