Skip to content

AWS, GCP: Fix double-checked-locking pattern in S3FileIO, GCSFileIO. #13276

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ChaladiMohanVamsi
Copy link
Contributor

@ChaladiMohanVamsi ChaladiMohanVamsi commented Jun 8, 2025

Fix: double-checked-locking pattern in S3FileIO, GCSFileIO.

Ensure Thread-Safe Initialization of clientByPrefix and storageByPrefix

This PR addresses a thread-safety issue during the lazy initialization of the clientByPrefix map. Previously, the shared map was being assigned before it was fully populated, which could lead to race conditions and visibility issues in a concurrent environment.

Changes:

  • Introduced a local variable localClientByPrefix inside the synchronized block.
  • Populated this local map with all required entries.
  • Only after full initialization, assigned it to the shared clientByPrefix field.

Benefits:

  • Prevents other threads from observing a partially initialized map.
  • Eliminates the risk of NullPointerException due to premature assignment.
  • Aligns with best practices for safe publication in multi-threaded contexts.
- this.clientByPrefix = Maps.newHashMap();
+ Map<String, PrefixedS3Client> localClientByPrefix = Maps.newHashMap();
...
+ this.clientByPrefix = localClientByPrefix;

@github-actions github-actions bot added the AWS label Jun 8, 2025
@github-actions github-actions bot added the GCP label Jun 8, 2025
@ChaladiMohanVamsi ChaladiMohanVamsi changed the title AWS: Fix double-checked-locking pattern in S3FileIO. AWS, GCP: Fix double-checked-locking pattern in S3FileIO, GCSFileIO. Jun 8, 2025
@@ -434,9 +434,9 @@ private Map<String, PrefixedS3Client> clientByPrefix() {
if (null == clientByPrefix) {
synchronized (this) {
if (null == clientByPrefix) {
this.clientByPrefix = Maps.newHashMap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow the issue on how there can be visibility issues since the field is volatile and is assigned in a synchronized block?

Copy link
Contributor Author

@ChaladiMohanVamsi ChaladiMohanVamsi Jun 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • The issue is not about the field visibility problem, it is about field being pre-maturely visible to all other threads before populating all the required entries in the map.
  • Assuming there are 2 threads executing the method.
    • First thread entered the synchronized block and initialized empty hashmap before populating the entries.
    • It is possible that second thread don't even reach synchronized block as the first check (null == clientByPrefix) evaluates to false because the first thread assigned field to an empty hashmap.
    • Second thread proceeds its computation consuming the empty hashmap and could result in error.
  • To cover this scenario, we should have field assignment as a last statement of synchronized block. This ensures the map field is visible to all other threads only after it is completely populated, until then other threads will be waiting at synchronized block as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants