Validation scope of check_nullable methods #2004

amerberg · 2025-05-20T18:10:15Z

amerberg
May 20, 2025

I've noticed that the pandas and pyspark backends define check_nullable with SCHEMA validation scope, whereas polars uses DATA scope for the same check. Is there a reason for this difference?

My personal preference would be to use DATA here. I imagine that the choice of SCHEMA might have its roots in the database world, where nullability is part of the schema. However, at least with pandas, this check requires inspecting the data, and we've found it to be slow, so classifying this as a schema check makes it harder to use validation depth to improve performance.

I'm happy to contribute a PR if there's interest in making this more consistent, even if it's not in my preferred resolution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Validation scope of check_nullable methods #2004

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

Uh oh!

Validation scope of check_nullable methods #2004

Uh oh!

amerberg May 20, 2025

Replies: 0 comments

amerberg
May 20, 2025