-
-
Notifications
You must be signed in to change notification settings - Fork 27
COMPAT: prepare for pandas 3.0 string dtype #493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
COMPAT: prepare for pandas 3.0 string dtype #493
Conversation
887d1e8
to
6520de4
Compare
All green now, would like to include this in the upcoming 0.11 release |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jorisvandenbossche ! Overall this looks good, just a few minor comments.
Can you please add a changelog entry?
pyogrio/geopandas.py
Outdated
try: | ||
res = pd.to_datetime(ser, utc=True, **datetime_kwargs) | ||
except Exception: | ||
pass | ||
|
||
if res.dtype != "object": | ||
if res.dtype.kind == "M": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To confirm, this is the dtype code for datetime with millisecond precision? Perhaps add a comment to indicate the full dtype name here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not specifically millisecond resolution, but just datetime64 in general (the numpy datetime64 dtype's short character name is "M").
But will add a comment
pyogrio/tests/test_geopandas_io.py
Outdated
@@ -1214,7 +1234,11 @@ def test_write_None_string_column(tmp_path, use_arrow): | |||
assert filename.exists() | |||
|
|||
result_gdf = read_dataframe(filename, use_arrow=use_arrow) | |||
assert result_gdf.object_col.dtype == object | |||
assert ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this assert
be folded into the conditional below (with a corresponding else
)?
assert output_gdf["mixed"][idx] is None | ||
else: | ||
assert output_gdf["mixed"][idx] == str(value) | ||
# mixed values as object dtype are currently written as strings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# mixed values as object dtype are currently written as strings | |
# non-null mixed values as object dtype are currently written as strings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(also null values are written as string dtype, but preserving the fact they are null)
Testing with the string dtype enabled (
pd.options.future.infer_string = True
, orPANDAS_FUTURE_INFER_STRING=1
set as environment variable).