You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've noticed that the pandas and pyspark backends define check_nullable with SCHEMA validation scope, whereas polars uses DATA scope for the same check. Is there a reason for this difference?
My personal preference would be to use DATA here. I imagine that the choice of SCHEMA might have its roots in the database world, where nullability is part of the schema. However, at least with pandas, this check requires inspecting the data, and we've found it to be slow, so classifying this as a schema check makes it harder to use validation depth to improve performance.
I'm happy to contribute a PR if there's interest in making this more consistent, even if it's not in my preferred resolution.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
I've noticed that the pandas and pyspark backends define
check_nullablewithSCHEMAvalidation scope, whereas polars usesDATAscope for the same check. Is there a reason for this difference?My personal preference would be to use
DATAhere. I imagine that the choice ofSCHEMAmight have its roots in the database world, where nullability is part of the schema. However, at least with pandas, this check requires inspecting the data, and we've found it to be slow, so classifying this as a schema check makes it harder to use validation depth to improve performance.I'm happy to contribute a PR if there's interest in making this more consistent, even if it's not in my preferred resolution.
Beta Was this translation helpful? Give feedback.
All reactions