Resolve serialization for numpy bool 1.x and 2.x compatibility #53690

sjyangkevin · 2025-07-24T02:15:37Z

Create this PR to help review the solution for #52753
Closes: #52753

Unit Test

the testing ran in the breeze environment with numpy 1.26.4 installed. The following test case failed because it attempted to import numpy.bool. np.bool was a deprecated alias for the built-in Python bool type, introduced in NumPy version 1.20.0. np.bool_ is the Numpy scalar. If we keep both "numpy.bool" and "numpy.bool_" in the list of serializers and the list of deserializers, we might need to update this test case to run based on the installed numpy version. Or, update the code to check which numpy version we have and import and check the right bool (#52753 (comment)).

breeze testing core-tests --test-type Serialization

then, I created a init file files/airflow-breeze-config/init.sh to run uv pip install --upgrade pandas. This upgrade both pandas and numpy to the following versions. We may not only update one package, because the pandas depends on the numpy package and can result in the error mentioned #52753 (comment). The test case that failed when numpy 1.26.4 is installed now pass because numpy.bool is the NumPy scalar in 2.3.1.

Installed 2 packages in 116ms
- numpy==1.26.4
+ numpy==2.3.1
- pandas==2.1.4
+ pandas==2.3.1

If we only upgrade numpy to 2.x and didn't update pandas, the following error is raised. (uv pip install --upgrade numpy)

Installed 1 package in 188ms
- numpy==1.26.4
+ numpy==2.3.1

DAG Test (Run-time Behavior)

Numpy version 1.26.4

Numpy version 2.3.1

^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

sjyangkevin · 2025-07-24T02:21:01Z

airflow-core/src/airflow/serialization/serializers/numpy.py

-    if cls not in allowed_deserialize_classes:
-        raise TypeError(f"unsupported {qualname(cls)} found for numpy deserialization")
+    name = qualname(cls)
+    if name not in deserializers:


we can remove this check if this is redundant.

Why redundant? It's good to verify it here. No harm done.

@bolkedebruin ?

It is redundant. The deserializer will only be invoked when serde finds it in deserializers. This was added in the PR of PyDantic support and should not have been there. Let's remove it.

airflow-core/src/airflow/serialization/serializers/numpy.py

bolkedebruin · 2025-07-24T07:35:30Z

airflow-core/src/airflow/serialization/serializers/numpy.py

-    if cls not in allowed_deserialize_classes:
-        raise TypeError(f"unsupported {qualname(cls)} found for numpy deserialization")
+    name = qualname(cls)
+    if name not in deserializers:


It is redundant. The deserializer will only be invoked when serde finds it in deserializers. This was added in the PR of PyDantic support and should not have been there. Let's remove it.

airflow-core/src/airflow/serialization/serializers/numpy.py

potiuk · 2025-07-24T12:15:22Z

What fails exactly?

See the failing test case.

potiuk · 2025-07-24T12:16:36Z

airflow-core/tests/unit/serialization/test_serde.py

sjyangkevin · 2025-07-25T03:04:49Z

airflow-core/tests/unit/serialization/serializers/test_serializers.py

@@ -250,7 +250,6 @@ def test_numpy_serializers(self):
        ("klass", "ver", "value", "msg"),
        [
            (np.int32, 999, 123, r"serialized version is newer"),
-            (np.float32, 1, 123, r"unsupported numpy\.float32"),


This test case is stale after removing the redundant type check in the deserializers. Also, I notice numpy.float32 is not in the serializers list, but there is such a check. Not sure if it is an issue.

if isinstance(o, (np.float16, np.float32, np.float64, np.complex64, np.complex128)): return float(o), name, __version__, True

…ility between v1 and v2

…s than 2

sjyangkevin · 2025-07-28T00:10:05Z

Use xfail to capture the expected failure and show the documentation when np.bool is imported with NumPy 1.x installed.

sjyangkevin · 2025-07-28T00:54:07Z

Use xfail to capture the expected failure and show the documentation when np.bool is imported with NumPy 1.x installed.

Just found out the rest of the code will not be executed by using pytest.xfail in this way. Will think about another way to do it.

… individually to numpy bool

sjyangkevin · 2025-07-28T03:25:38Z

airflow-core/tests/unit/serialization/test_serde.py

The test case test_serializers_importable_and_str is refactored to be parameterized on serializers. The logic that loads serializers from modules are moved into generate_serializers_importable_and_str_test_cases. This function generate all the test cases (the input and the expected output) to test if a serializer is importable.

It creates a single test case for each serializer defined in the module.

It allows us to mark the numpy.bool as expected failure (i.e., xfail) in pytest outcomes when the NumPy version in breeze environment is 1.x. The failure reason is documented in the xfail message.

If the NumPy version in breeze environemnt is 2.x, the test case for numpy.bool will pass.

Please let me know if this approach is reasonable. Also, I would appreciate if you could provide some feedback on the test implementation (e.g., the invocation of a function in pytest.mark.parameterize to generate test case). Thanks!

airflow-core/tests/unit/serialization/test_serde.py

uranusjr · 2025-07-29T03:22:35Z

airflow-core/tests/unit/serialization/test_serde.py

@@ -61,6 +63,49 @@ def recalculate_patterns():
        _match_regexp.cache_clear()


+def generate_serializers_importable_and_str_test_cases():


I’m not personally a fan in dynamically generating cases like this (if we get the logic wrong, there’s no easy way to detect missing cases) but it’s a personal opinion.

Thanks for the feedback. I would like to elaborate more about the thoughts behind this implementation. Another approach to achieve it may be hard-coding all the values in the parameterization list. In the case of new serializers are added, we can also update the test case manually.

From another perspective, I think the intention may be to simulate the following serializer loading (from what I understand, but may be wrong).

In the original implementation, it is a bit challenging to mark the loading of np.bool as an expected failure when the installed numpy version is 1.x, since executing pytest.xfail will interrupt the execution of the test.

airflow/airflow-core/src/airflow/serialization/serde.py

Lines 361 to 376 in 0cd0ff0

def _register():

"""Register builtin serializers and deserializers for types that don't have any themselves."""

_serializers.clear()

_deserializers.clear()

_stringifiers.clear()

with Stats.timer("serde.load_serializers") as timer:

for _, name, _ in iter_namespace(airflow.serialization.serializers):

name = import_module(name)

for s in getattr(name, "serializers", ()):

if not isinstance(s, str):

s = qualname(s)

if s in _serializers and _serializers[s] != name:

raise AttributeError(f"duplicate {s} for serialization in {name} and {_serializers[s]}")

log.debug("registering %s for serialization", s)

_serializers[s] = name

since executing pytest.xfail will interrupt the execution of the test

Can you elaborate on this?

Note that no other code is executed after the pytest.xfail() call, differently from the marker. That’s because it is implemented internally by raising a known exception.
https://docs.pytest.org/en/stable/how-to/skipping.html#xfail-mark-test-functions-as-expected-to-fail

5735f6b

In the above commit, I mark a test as XFAIL from within the test. Suppose the modules are loading as follow:

np.bool_
np.bool
kubernetes.client.models.v1_pod.V1Pod
datetime.date
...

If the test is marked as XFAIL from within the test while loading np.bool. The test will not continue to check for the import for kubernetes.client.models.v1_pod.V1Pod and datetime.date.

Why not just mark the bool case instead, not the entire test? (Actually skip is probably better than xfail for this.)

A skip means that you expect your test to pass only if some conditions are met, otherwise pytest should skip running the test altogether.

I revisited the document and seems like skip is a better option. In this case, I would expect the np.bool case to pass if the installed NumPy version is above 2.0.

Why not just mark the bool case instead, not the entire test?

I wanted to understand it better in terms of the implementation. Do you suggest to replace the pytest.xfail in the commit 5735f6b with pytest.skip? Or I should restructure it to individual tests (i.e., np.bool has its own test case) and use pytest.mark.skipif for that test case.

Splitting the case out to a separate test is a good idea. Splitting out the deltalake cases is probably also a good idea. This would make the dynamically param generation code as simple as possible and reduce the possibility of mistakes to the minimum.

Sure, I will see if I can figure out something to balance the trade offs here for that test case.

ashb · 2025-07-29T15:37:10Z

Are there any similar cases we need to account for with float or int between numpy 1.x and 2.x?

sjyangkevin · 2025-07-30T00:26:02Z

Are there any similar cases we need to account for with float or int between numpy 1.x and 2.x?

I have a test DAG #52753 (comment) which check for the integer types and the floating point types listed in the serializers. I think it is good with those types.

However, as mentioned here #53690 (comment). The np.float32 is not in the serializers list. So, it is likely that when this type is passed through, there is no serializer can be found for this. I can also run a test for it. (But it is a different thing from the issue we discussed)

boring-cyborg bot added the area:serialization label Jul 24, 2025

sjyangkevin commented Jul 24, 2025

View reviewed changes

potiuk approved these changes Jul 24, 2025

View reviewed changes

potiuk reviewed Jul 24, 2025

View reviewed changes

airflow-core/src/airflow/serialization/serializers/numpy.py Show resolved Hide resolved

sjyangkevin commented Jul 24, 2025

View reviewed changes

airflow-core/src/airflow/serialization/serializers/numpy.py Show resolved Hide resolved

bolkedebruin requested changes Jul 24, 2025

View reviewed changes

sjyangkevin commented Jul 25, 2025

View reviewed changes

airflow-core/tests/unit/serialization/test_serde.py Outdated Show resolved Hide resolved

sjyangkevin commented Jul 25, 2025

View reviewed changes

sjyangkevin marked this pull request as ready for review July 25, 2025 03:11

sjyangkevin requested a review from ashb as a code owner July 25, 2025 03:11

sjyangkevin changed the title ~~Resolve serialization for numpy 1.x and 2.x compatibility~~ Resolve serialization for numpy bool 1.x and 2.x compatibility Jul 25, 2025

sjyangkevin added 3 commits July 27, 2025 20:05

fix serialization for np.bool in numpy 2.x

bd07ac2

update serializers importable test case to handle numpy bool compatib…

fed16ac

…ility between v1 and v2

use pytest xfail to capture np.bool import error in numpy version les…

5735f6b

…s than 2

sjyangkevin force-pushed the issues/52753/numpy-1.x-2.x-bool-incompatibility branch from d5231d4 to 5735f6b Compare July 28, 2025 00:05

parameterize the serializers importable test, so xfail can be applied…

d118c1a

… individually to numpy bool

sjyangkevin commented Jul 28, 2025

View reviewed changes

sjyangkevin requested review from potiuk and bolkedebruin July 28, 2025 03:25

uranusjr reviewed Jul 28, 2025

View reviewed changes

airflow-core/tests/unit/serialization/test_serde.py Outdated Show resolved Hide resolved

add textwrap dedent for message

584dc0b

sjyangkevin requested a review from uranusjr July 29, 2025 01:43

uranusjr reviewed Jul 29, 2025

View reviewed changes

		@@ -61,6 +63,49 @@ def recalculate_patterns():
		_match_regexp.cache_clear()


		def generate_serializers_importable_and_str_test_cases():

	def _register():
	"""Register builtin serializers and deserializers for types that don't have any themselves."""
	_serializers.clear()
	_deserializers.clear()
	_stringifiers.clear()

	with Stats.timer("serde.load_serializers") as timer:
	for _, name, _ in iter_namespace(airflow.serialization.serializers):
	name = import_module(name)
	for s in getattr(name, "serializers", ()):
	if not isinstance(s, str):
	s = qualname(s)
	if s in _serializers and _serializers[s] != name:
	raise AttributeError(f"duplicate {s} for serialization in {name} and {_serializers[s]}")
	log.debug("registering %s for serialization", s)
	_serializers[s] = name

Resolve serialization for numpy bool 1.x and 2.x compatibility #53690

Are you sure you want to change the base?

Resolve serialization for numpy bool 1.x and 2.x compatibility #53690

Conversation

sjyangkevin commented Jul 24, 2025 • edited by eladkal Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Unit Test

DAG Test (Run-time Behavior)

Uh oh!

sjyangkevin Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

potiuk commented Jul 24, 2025

Uh oh!

potiuk commented Jul 24, 2025

Uh oh!

Uh oh!

sjyangkevin Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sjyangkevin commented Jul 28, 2025

Uh oh!

sjyangkevin commented Jul 28, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sjyangkevin Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ashb commented Jul 29, 2025

Uh oh!

sjyangkevin commented Jul 30, 2025

Uh oh!

Uh oh!

sjyangkevin commented Jul 24, 2025 •

edited by eladkal

Loading

sjyangkevin Jul 24, 2025 •

edited

Loading

sjyangkevin Jul 25, 2025 •

edited

Loading

sjyangkevin Jul 29, 2025 •

edited

Loading