Skip to content

GH-38914: [Python] Add EncryptionConfiguration.uniform_encryption #46347

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

MartinNowak
Copy link
Contributor

@MartinNowak MartinNowak commented May 7, 2025

Rationale for this change

Support uniform encryption when writing parquet files.

What changes are included in this PR?

Exposing EncryptionConfiguration.uniform_encryption to pyarrow.

Are these changes tested?

Yes, see included tests.

Are there any user-facing changes?

Addition of uniform_encryption parameter to EncryptionConfiguration.

@MartinNowak MartinNowak force-pushed the fix-38914-pyarrow-uniform-encryption branch 2 times, most recently from 266781f to b78e6e2 Compare May 7, 2025 13:55
@MartinNowak MartinNowak marked this pull request as ready for review May 7, 2025 13:55
@MartinNowak MartinNowak requested review from AlenkaF, raulcd and rok as code owners May 7, 2025 13:55
@MartinNowak
Copy link
Contributor Author

MartinNowak commented May 7, 2025

Python / AMD64 Conda Python 3.10 Without Pandas (pull_request) Failing after 17m

GDB test failures seem unrelated.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels May 7, 2025
Copy link
Member

@rok rok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for picking this up @MartinNowak. PR looks good to me.
Only one nit - if I understand correctly uniform_encryption defaults to False so encrypting with:

    encryption_config = pe.EncryptionConfiguration(
        footer_key=FOOTER_KEY_NAME,
        encryption_algorithm="AES_GCM_V1",
        cache_lifetime=timedelta(minutes=5.0),
        data_key_length_bits=256)

Should raise: "Either column_keys or uniform_encryption must be set". Could we add test for that?

@MartinNowak
Copy link
Contributor Author

Should raise: "Either column_keys or uniform_encryption must be set". Could we add test for that?

Thanks for the quick review @rok. That was already tested.

def test_encrypted_parquet_write_no_col_key(tempdir, data_table):
"""Write an encrypted parquet, but give only footer key,
without column key."""
path = tempdir / 'encrypted_table_no_col_key.in_mem.parquet'
# Encrypt the footer with the footer key
encryption_config = pe.EncryptionConfiguration(
footer_key=FOOTER_KEY_NAME)
with pytest.raises(OSError,
match="Either column_keys or uniform_encryption "
"must be set"):
# Write with encryption properties
write_encrypted_file(path, data_table, FOOTER_KEY_NAME, COL_KEY_NAME,
FOOTER_KEY, b"", encryption_config)
.

@MartinNowak MartinNowak requested a review from rok May 8, 2025 08:42
@rok
Copy link
Member

rok commented May 8, 2025

Thanks for the quick review @rok. That was already tested.

Oh yeah, you're right.

Copy link
Member

@rok rok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to go.
Pinging @pitrou to double check me :).

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting changes Awaiting changes labels May 8, 2025
@rok
Copy link
Member

rok commented May 8, 2025

The gdb issue seems unrelated.

@MartinNowak MartinNowak force-pushed the fix-38914-pyarrow-uniform-encryption branch from b78e6e2 to 5c0894c Compare May 13, 2025 10:21
@rok
Copy link
Member

rok commented May 13, 2025

Assuming CI passes I'll merge @pitrou

- expose missing EncryptionConfiguration parameter in pyarrow
@MartinNowak MartinNowak force-pushed the fix-38914-pyarrow-uniform-encryption branch from 5c0894c to e38abd9 Compare May 13, 2025 11:29
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks a lot for this @MartinNowak

@pitrou pitrou merged commit 6ab37dd into apache:main May 13, 2025
13 of 14 checks passed
@pitrou pitrou removed the awaiting merge Awaiting merge label May 13, 2025
Copy link

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 6ab37dd.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 3 possible false positives for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Python][Parquet] Add EncryptionConfiguration.uniform_encryption to Python implementation
3 participants