- 
                Notifications
    You must be signed in to change notification settings 
- Fork 2.9k
Description
Apache Iceberg version
1.10.0 (latest release)
Query engine
None
Please describe the bug 🐞
Starting in Iceberg 1.10.0, checksum validation began to be enforced for S3 PutObject operations. This happens even when the Iceberg config s3.checksum-enabled is set to its default of false.
At first glance this looks like an Iceberg bug, but it actually stems from an AWS SDK policy change: the AWS client’s AWS_REQUEST_CHECKSUM_CALCULATION default was changed to WHEN_SUPPORTED. With that default, the client automatically performs checksum calculation whenever the service supports it, even if the caller didn’t explicitly enable checksums.
(https://docs.aws.amazon.com/sdkref/latest/guide/feature-dataintegrity.html)
Proposal:
When initializing the S3 client in Iceberg, set AWS_REQUEST_CHECKSUM_CALCULATION to WHEN_REQUIRED so that checksums are only calculated when strictly required. Then allow users to control checksum usage via Iceberg’s s3.checksum-enabled setting.
This way:
- Users who want checksums can enable them explicitly via s3.checksum-enabled.
- Users who keep the default (false) won’t incur checksum calculation/validation unexpectedly after upgrading to 1.10.0.
What do you think about this approach?
Willingness to contribute
- I can contribute a fix for this bug independently
- I would be willing to contribute a fix for this bug with guidance from the Iceberg community
- I cannot contribute a fix for this bug at this time