Skip to content

Conversation

@prince-cs
Copy link

@prince-cs prince-cs commented May 5, 2025

Fix encrypted file reader for DP 2.1 and above

Bug Tracker

PLUGIN-1717

Description

Currently the file read on 2.1, 2.2 produces garbage binary data when reading an encrypted file, this is caused as new Hadoop version uses different file open methods.

This PR overrides the new file open method used and extend the decryption logic being used.

Code change

  • Modified EncryptedFileSystem.java
    • Add new methods to open file.
  • pom.xml
    • Bump the client lib used to get the new open methods.

Tests

Meta

  • Test case uses 2 CSV files
    • 100 records
    • 100K records (~10 MB)
  • Pipeline are run 4 times
    • Ephemeral Cluster
    • Existing Cluster (Dataproc Image 2.0)
    • Existing Cluster (Dataproc Image 2.1)
    • Existing Cluster (Dataproc Image 2.2)

Test Case [100 - Ephemeral]

Screenshot 2025-06-03 at 2 06 20 AM

Test Case [100K - Ephemeral]

Screenshot 2025-06-03 at 2 02 49 AM

Test Case [100 - DP2.0]

Screenshot 2025-06-03 at 2 17 10 AM

Test Case [100K - DP2.0]

Screenshot 2025-06-03 at 1 49 21 AM

Test Case [100 - DP2.1]

Screenshot 2025-06-03 at 2 10 43 AM

Test Case [100K - DP2.1]

Screenshot 2025-06-03 at 1 57 12 AM

Test Case [100 - DP2.2]

Test Case [100K - DP2.2]

@prince-cs prince-cs added the build Trigger unit test build label May 6, 2025
@psainics psainics removed the build Trigger unit test build label May 19, 2025
@psainics psainics self-assigned this May 19, 2025
@psainics psainics added the build Trigger unit test build label Jun 1, 2025
Copy link
Member

@itsankit-google itsankit-google left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add e2e tests for this change.

@itsankit-google
Copy link
Member

Bigtable e2e tests are failing currently:
image

@sahusanket
Copy link
Contributor

Also add e2e tests for this change.

Please create a JIRA , add in a follow up PR

Copy link
Contributor

@sahusanket sahusanket left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please squash commits before merging

protected CompletableFuture<FSDataInputStream> openFileWithOptions(Path path, OpenFileParameters parameters) {
return CompletableFuture.supplyAsync(() -> {
try {
int bufferSize = parameters.getBufferSize() > 0 ? parameters.getBufferSize() : 4096;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use DEFAULT_BUFFER_SIZE instead of 4096

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated !

private static final String FS_SCHEME = CONF_PREFIX + "scheme";
private static final String FS_IMPL = CONF_PREFIX + "impl";
private static final String DECRYPTOR_IMPL = CONF_PREFIX + "decryptor.impl";
private static final int DEFAULT_BUFFER_SIZE = 4096;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how did we calculate this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@psainics please fix this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated !

Copy link
Member

@itsankit-google itsankit-google left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added one comment and one question.

@psainics psainics merged commit f79f1b9 into data-integrations:develop Aug 8, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

build Trigger unit test build

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants