-
Notifications
You must be signed in to change notification settings - Fork 28.9k
[SPARK-53931][INFRA][PYTHON] Fix scheduled job for numpy 2.1.3 #52633
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM. Thank you, @zhengruifeng . I was also worrying about that failed CI, but didn't get a chance.
For this one, do you think we need to document somewhere this incompatibility because our minimum
|
@zhengruifeng, I have a silly question about Python deps management - I see that many Python deps are declared without a version, or with a range version(half-bounded, e.g. This means that if we do not specify the dependency version, or only specify the lower bound of the dependency version, PySpark may not work once a new major version of the dependency is released. This becomes a problem if users want to create a venv for older PySpark versions (in practice, EOLed versions of Spark are used widely and upgrading is not timely). I wonder if PySpark can pin all Python deps in a fixed version(or at least a bounded range version, e.g. |
@pan3793 the reason to use lower bounds currently, most workflows are testing against latests version; and we have two workflow against the minimum versions in which the versions of key packages (numpy/pyarrow/pandas) are pinned
But I personally think maybe we should use a fixed version |
@dongjoon-hyun I am not sure since it is a pyarrow bug introduced in 19.0.0 and fixed in 19.0.1. |
@zhengruifeng, that makes a lot of sense! |
merged to master to restore the CI |
Thank you. In that case, it looks okay to me, too. It doesn't need us to pay more attention.
|
What changes were proposed in this pull request?
Fix scheduled job for numpy 2.1.3
Why are the changes needed?
to fix https://github.com/apache/spark/actions/runs/18538043179/job/52838303733
it was caused by a bug in 19.0.0, see apache/arrow#45283
Does this PR introduce any user-facing change?
no, infra-only
How was this patch tested?
PR builder with
see https://github.com/zhengruifeng/spark/actions/runs/18527303212/job/52801019275
Was this patch authored or co-authored using generative AI tooling?
no