Skip to content

GH-45902: [Python][Parquet] Expose ParquetWriter properties and arrow_properties #47087

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

SoundBot
Copy link

@SoundBot SoundBot commented Jul 12, 2025

Rationale for this change

Exposes ParquetWriter properties via writer.properties and writer.arrow_properties
(see #45902)

What changes are included in this PR?

See above

Are these changes tested?

Yes

Are there any user-facing changes?

Yes, properties are available via writer.properties and writer.arrow_properties

Copy link

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

@SoundBot SoundBot force-pushed the expose-parquetwrite-props branch 3 times, most recently from f921baf to db3942b Compare July 14, 2025 03:47
@SoundBot SoundBot changed the title [Python][Parquet] Expose ParquetWriter properties and arrow_properties GH-45902: [Python][Parquet] Expose ParquetWriter properties and arrow_properties Jul 14, 2025
Copy link

⚠️ GitHub issue #45902 has been automatically assigned in GitHub to PR creator.

@SoundBot SoundBot marked this pull request as ready for review July 14, 2025 04:18
Copy link
Member

@AlenkaF AlenkaF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for submitting a PR!
From a quick look the test needs a @pytest.mark.pandas mark and some style changes are needed to fix linter errors:

diff --git a/cpp/src/parquet/arrow/writer.cc b/cpp/src/parquet/arrow/writer.cc
index 3724e2edf0..cdaeebc294 100644
--- a/cpp/src/parquet/arrow/writer.cc
+++ b/cpp/src/parquet/arrow/writer.cc
@@ -473,7 +473,9 @@ class FileWriterImpl : public FileWriter {
 
   const WriterProperties& properties() const override { return *writer_->properties(); }
 
-  const ArrowWriterProperties& arrow_properties() const override { return *arrow_properties_; }
+  const ArrowWriterProperties& arrow_properties() const override {
+    return *arrow_properties_;
+  }
 
   ::arrow::MemoryPool* memory_pool() const override {
     return column_write_context_.memory_pool;
diff --git a/python/pyarrow/_parquet.pyx b/python/pyarrow/_parquet.pyx
index f1287d7b8a..6f3b9a5f[43](https://github.com/apache/arrow/actions/runs/16257450496/job/45897414651?pr=47087#step:5:44) 100644
--- a/python/pyarrow/_parquet.pyx
+++ b/python/pyarrow/_parquet.pyx
@@ -2458,7 +2458,6 @@ cdef class WriterPropertiesWrapper(_Weakrefable):
         return self.props.default_column_properties().statistics_enabled()
 
 
-
 cdef class ArrowWriterPropertiesWrapper(_Weakrefable):
     cdef:
         shared_ptr[ArrowWriterProperties] props

I will have a look at the proposed code shortly.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Jul 14, 2025
@SoundBot SoundBot force-pushed the expose-parquetwrite-props branch from db3942b to 23d3161 Compare July 14, 2025 23:48
@SoundBot SoundBot force-pushed the expose-parquetwrite-props branch from 23d3161 to 3adfc51 Compare July 14, 2025 23:54
@SoundBot
Copy link
Author

@AlenkaF thank you, I added @pytest.mark.pandas and ran pre-commit run --all-files

Comment on lines +455 to +456
df = _test_dataframe(100)
table = pa.Table.from_pandas(df, preserve_index=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not specifically related to this PR but would be nice to have an utility function to create an Arrow table directly instead of having to go through pandas. This test doesn't really have to require pandas but the utility function is handy. We can create an issue to move some of those tests to use a new utility function and remove the @pytest.mark.pandas for some of those tests. @AlenkaF what do you think? It could be a good-first-issue

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, adding a new utility function to create Arrow tables and/or batches would be great! And perfect for good-first-issue 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created: #47172

@raulcd
Copy link
Member

raulcd commented Jul 21, 2025

@github-actions crossbow submit -g python -g wheel

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Jul 21, 2025
Copy link

Revision: 3adfc51

Submitted crossbow builds: ursacomputing/crossbow @ actions-623832f922

Task Status
example-python-minimal-build-fedora-conda GitHub Actions
example-python-minimal-build-ubuntu-venv GitHub Actions
python-sdist GitHub Actions
test-conda-python-3.10 GitHub Actions
test-conda-python-3.10-hdfs-2.9.2 GitHub Actions
test-conda-python-3.10-hdfs-3.2.1 GitHub Actions
test-conda-python-3.10-pandas-latest-numpy-latest GitHub Actions
test-conda-python-3.11 GitHub Actions
test-conda-python-3.11-dask-latest GitHub Actions
test-conda-python-3.11-dask-upstream_devel GitHub Actions
test-conda-python-3.11-hypothesis GitHub Actions
test-conda-python-3.11-pandas-latest-numpy-1.26 GitHub Actions
test-conda-python-3.11-pandas-latest-numpy-latest GitHub Actions
test-conda-python-3.11-pandas-nightly-numpy-nightly GitHub Actions
test-conda-python-3.11-pandas-upstream_devel-numpy-nightly GitHub Actions
test-conda-python-3.11-spark-master GitHub Actions
test-conda-python-3.12 GitHub Actions
test-conda-python-3.12-cpython-debug GitHub Actions
test-conda-python-3.13 GitHub Actions
test-conda-python-3.9 GitHub Actions
test-conda-python-3.9-pandas-1.1.3-numpy-1.19.5 GitHub Actions
test-conda-python-emscripten GitHub Actions
test-cuda-python-ubuntu-22.04-cuda-11.7.1 GitHub Actions
test-debian-12-python-3-amd64 GitHub Actions
test-debian-12-python-3-i386 GitHub Actions
test-fedora-39-python-3 GitHub Actions
test-ubuntu-22.04-python-3 GitHub Actions
test-ubuntu-22.04-python-313-freethreading GitHub Actions
test-ubuntu-24.04-python-3 GitHub Actions
wheel-macos-monterey-cp310-cp310-amd64 GitHub Actions
wheel-macos-monterey-cp310-cp310-arm64 GitHub Actions
wheel-macos-monterey-cp311-cp311-amd64 GitHub Actions
wheel-macos-monterey-cp311-cp311-arm64 GitHub Actions
wheel-macos-monterey-cp312-cp312-amd64 GitHub Actions
wheel-macos-monterey-cp312-cp312-arm64 GitHub Actions
wheel-macos-monterey-cp313-cp313-amd64 GitHub Actions
wheel-macos-monterey-cp313-cp313-arm64 GitHub Actions
wheel-macos-monterey-cp313-cp313t-amd64 GitHub Actions
wheel-macos-monterey-cp313-cp313t-arm64 GitHub Actions
wheel-macos-monterey-cp39-cp39-amd64 GitHub Actions
wheel-macos-monterey-cp39-cp39-arm64 GitHub Actions
wheel-manylinux-2-28-cp310-cp310-amd64 GitHub Actions
wheel-manylinux-2-28-cp310-cp310-arm64 GitHub Actions
wheel-manylinux-2-28-cp311-cp311-amd64 GitHub Actions
wheel-manylinux-2-28-cp311-cp311-arm64 GitHub Actions
wheel-manylinux-2-28-cp312-cp312-amd64 GitHub Actions
wheel-manylinux-2-28-cp312-cp312-arm64 GitHub Actions
wheel-manylinux-2-28-cp313-cp313-amd64 GitHub Actions
wheel-manylinux-2-28-cp313-cp313-arm64 GitHub Actions
wheel-manylinux-2-28-cp313-cp313t-amd64 GitHub Actions
wheel-manylinux-2-28-cp313-cp313t-arm64 GitHub Actions
wheel-manylinux-2-28-cp39-cp39-amd64 GitHub Actions
wheel-manylinux-2-28-cp39-cp39-arm64 GitHub Actions
wheel-musllinux-1-2-cp310-cp310-amd64 GitHub Actions
wheel-musllinux-1-2-cp310-cp310-arm64 GitHub Actions
wheel-musllinux-1-2-cp311-cp311-amd64 GitHub Actions
wheel-musllinux-1-2-cp311-cp311-arm64 GitHub Actions
wheel-musllinux-1-2-cp312-cp312-amd64 GitHub Actions
wheel-musllinux-1-2-cp312-cp312-arm64 GitHub Actions
wheel-musllinux-1-2-cp313-cp313-amd64 GitHub Actions
wheel-musllinux-1-2-cp313-cp313-arm64 GitHub Actions
wheel-musllinux-1-2-cp313-cp313t-amd64 GitHub Actions
wheel-musllinux-1-2-cp313-cp313t-arm64 GitHub Actions
wheel-musllinux-1-2-cp39-cp39-amd64 GitHub Actions
wheel-musllinux-1-2-cp39-cp39-arm64 GitHub Actions
wheel-windows-cp310-cp310-amd64 GitHub Actions
wheel-windows-cp311-cp311-amd64 GitHub Actions
wheel-windows-cp312-cp312-amd64 GitHub Actions
wheel-windows-cp313-cp313-amd64 GitHub Actions
wheel-windows-cp313-cp313t-amd64 GitHub Actions
wheel-windows-cp39-cp39-amd64 GitHub Actions

@raulcd
Copy link
Member

raulcd commented Jul 21, 2025

The example-python-minimal-build-* failures are unrelated to the PR, known issue:

@SoundBot
Copy link
Author

Is there anything else needed to merge this PR? @AlenkaF @raulcd

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants