[SPARK-53931][INFRA][PYTHON] Fix scheduled job for numpy 2.1.3 #52633

zhengruifeng · 2025-10-16T02:21:55Z

What changes were proposed in this pull request?

Fix scheduled job for numpy 2.1.3

Why are the changes needed?

to fix https://github.com/apache/spark/actions/runs/18538043179/job/52838303733

it was caused by a bug in 19.0.0, see apache/arrow#45283

Does this PR introduce any user-facing change?

no, infra-only

How was this patch tested?

PR builder with

default: '{"PYSPARK_IMAGE_TO_TEST": "numpy-213", "PYTHON_TO_TEST": "python3.11"}'

see https://github.com/zhengruifeng/spark/actions/runs/18527303212/job/52801019275

Was this patch authored or co-authored using generative AI tooling?

no

dongjoon-hyun

+1, LGTM. Thank you, @zhengruifeng . I was also worrying about that failed CI, but didn't get a chance.

dongjoon-hyun · 2025-10-16T02:30:49Z

For this one, do you think we need to document somewhere this incompatibility because our minimum numpy is still 1.22 and pyarrow is 15.0.0.

it was caused by a bug in 19.0.0, see apache/arrow#45283

pan3793 · 2025-10-16T03:11:34Z

@zhengruifeng, I have a silly question about Python deps management - I see that many Python deps are declared without a version, or with a range version(half-bounded, e.g. foo>=1.0 or bar<2.0). Silently upgrading 3rd libs may introduce breaking changes (especially for major version bumping)/bugs.

This means that if we do not specify the dependency version, or only specify the lower bound of the dependency version, PySpark may not work once a new major version of the dependency is released. This becomes a problem if users want to create a venv for older PySpark versions (in practice, EOLed versions of Spark are used widely and upgrading is not timely).

I wonder if PySpark can pin all Python deps in a fixed version(or at least a bounded range version, e.g. foo>=1.0,<=2.3), this clearly shows the versions of Spark that have been fully tested.

zhengruifeng · 2025-10-16T03:31:45Z

@zhengruifeng, I have a silly question about Python deps management - I see that many Python deps are declared without a version, or with a range version(half-bounded, e.g. foo>=1.0 or bar<2.0). Silently upgrading 3rd libs may introduce breaking changes (especially for major version bumping)/bugs.

This means that if we do not specify the dependency version, or only specify the lower bound of the dependency version, PySpark may not work once a new major version of the dependency is released. This becomes a problem if users want to use older Spark versions (in practice, EOLed versions of Spark are used widely and upgrading is not timely).

I wonder if Spark can pin all Python deps in a fixed version(or at least a bounded range version, e.g. foo>=1.0,<=2.3), this clearly shows the versions of Spark that have been fully tested.

@pan3793 the reason to use lower bounds foo>=1.0 in most places is to eagerly testing spark against latest packages (need to trigger the refresh of cached images), when spark get broken with new versions we set the upper bound (e.g. 22d2eb3) and restore it once the issue get resolved (e.g. 47574ba)

currently, most workflows are testing against latests version; and we have two workflow against the minimum versions in which the versions of key packages (numpy/pyarrow/pandas) are pinned

But I personally think maybe we should use a fixed version foo=1.2.3 or foo>=1.0,<=2.3 in the officially released images

@dongjoon-hyun @HyukjinKwon

zhengruifeng · 2025-10-16T03:34:15Z

For this one, do you think we need to document somewhere this incompatibility because our minimum numpy is still 1.22 and pyarrow is 15.0.0.

it was caused by a bug in 19.0.0, see apache/arrow#45283

@dongjoon-hyun I am not sure since it is a pyarrow bug introduced in 19.0.0 and fixed in 19.0.1.
I suspect there maybe also other similar cases in other packages.

pan3793 · 2025-10-16T03:37:59Z

But I personally think maybe we should use a fixed version foo=1.2.3 or foo>=1.0,<=2.3 in the officially released images

@zhengruifeng, that makes a lot of sense!

zhengruifeng · 2025-10-16T05:27:29Z

merged to master to restore the CI

dongjoon-hyun · 2025-10-16T05:45:25Z

Thank you. In that case, it looks okay to me, too. It doesn't need us to pay more attention.

I am not sure since it is a pyarrow bug introduced in 19.0.0 and fixed in 19.0.1.

test

5181c8a

github-actions bot added BUILD INFRA labels Oct 16, 2025

restore pr builder

94efe3d

zhengruifeng requested a review from HyukjinKwon October 16, 2025 02:22

github-actions bot removed the INFRA label Oct 16, 2025

dongjoon-hyun approved these changes Oct 16, 2025

View reviewed changes

zhengruifeng closed this in 9823daf Oct 16, 2025

zhengruifeng deleted the restore_numpy_213 branch October 16, 2025 05:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-53931][INFRA][PYTHON] Fix scheduled job for numpy 2.1.3 #52633

[SPARK-53931][INFRA][PYTHON] Fix scheduled job for numpy 2.1.3 #52633

Uh oh!

zhengruifeng commented Oct 16, 2025 •

edited

Loading

Uh oh!

dongjoon-hyun left a comment

Uh oh!

dongjoon-hyun commented Oct 16, 2025 •

edited

Loading

Uh oh!

pan3793 commented Oct 16, 2025 •

edited

Loading

Uh oh!

zhengruifeng commented Oct 16, 2025

Uh oh!

zhengruifeng commented Oct 16, 2025 •

edited

Loading

Uh oh!

pan3793 commented Oct 16, 2025

Uh oh!

zhengruifeng commented Oct 16, 2025

Uh oh!

dongjoon-hyun commented Oct 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-53931][INFRA][PYTHON] Fix scheduled job for numpy 2.1.3 #52633

[SPARK-53931][INFRA][PYTHON] Fix scheduled job for numpy 2.1.3 #52633

Uh oh!

Conversation

zhengruifeng commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

dongjoon-hyun commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pan3793 commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhengruifeng commented Oct 16, 2025

Uh oh!

zhengruifeng commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pan3793 commented Oct 16, 2025

Uh oh!

zhengruifeng commented Oct 16, 2025

Uh oh!

dongjoon-hyun commented Oct 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zhengruifeng commented Oct 16, 2025 •

edited

Loading

dongjoon-hyun commented Oct 16, 2025 •

edited

Loading

pan3793 commented Oct 16, 2025 •

edited

Loading

zhengruifeng commented Oct 16, 2025 •

edited

Loading