Skip to content

Commit 6c762f3

Browse files
meeseeksmachineLiam3851jorisvandenbossche
authored
Backport PR #61770 on branch 2.3.x (BUG: Fix unpickling of string dtypes of legacy pandas versions) (#61793)
Co-authored-by: David Krych <davidk@ciphercap.com> Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
1 parent fd99ef7 commit 6c762f3

File tree

8 files changed

+18
-1
lines changed

8 files changed

+18
-1
lines changed

doc/source/whatsnew/v2.3.1.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,7 @@ Bug fixes
5959
- Bug in :meth:`.DataFrameGroupBy.min`, :meth:`.DataFrameGroupBy.max`, :meth:`.Resampler.min`, :meth:`.Resampler.max` where all NA values of string dtype would return float instead of string dtype (:issue:`60810`)
6060
- Bug in :meth:`DataFrame.sum` with ``axis=1``, :meth:`.DataFrameGroupBy.sum` or :meth:`.SeriesGroupBy.sum` with ``skipna=True``, and :meth:`.Resampler.sum` with all NA values of :class:`StringDtype` resulted in ``0`` instead of the empty string ``""`` (:issue:`60229`)
6161
- Fixed bug in :meth:`DataFrame.explode` and :meth:`Series.explode` where methods would fail with ``dtype="str"`` (:issue:`61623`)
62+
- Fixed bug in unpickling objects pickled in pandas versions pre-2.3.0 that used :class:`StringDtype` (:issue:`61763`).
6263

6364

6465
.. _whatsnew_231.regressions:
@@ -72,7 +73,6 @@ Fixed regressions
7273

7374
Bug fixes
7475
~~~~~~~~~
75-
-
7676

7777
.. ---------------------------------------------------------------------------
7878
.. _whatsnew_231.other:

pandas/core/arrays/string_.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,8 @@
6969
from pandas.io.formats import printing
7070

7171
if TYPE_CHECKING:
72+
from collections.abc import MutableMapping
73+
7274
import pyarrow
7375

7476
from pandas._typing import (
@@ -213,6 +215,11 @@ def __eq__(self, other: object) -> bool:
213215
return self.storage == other.storage and self.na_value is other.na_value
214216
return False
215217

218+
def __setstate__(self, state: MutableMapping[str, Any]) -> None:
219+
# back-compat for pandas < 2.3, where na_value did not yet exist
220+
self.storage = state.pop("storage", "python")
221+
self._na_value = state.pop("_na_value", libmissing.NA)
222+
216223
def __hash__(self) -> int:
217224
# need to override __hash__ as well because of overriding __eq__
218225
return super().__hash__()
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.

pandas/tests/io/generate_legacy_storage_files.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -147,6 +147,7 @@ def create_pickle_data():
147147
"float": Index(np.arange(10, dtype=np.float64)),
148148
"uint": Index(np.arange(10, dtype=np.uint64)),
149149
"timedelta": timedelta_range("00:00:00", freq="30min", periods=10),
150+
"string": Index(["foo", "bar", "baz", "qux", "quux"], dtype="string"),
150151
}
151152

152153
index["range"] = RangeIndex(10)
@@ -185,6 +186,7 @@ def create_pickle_data():
185186
"dt": Series(date_range("20130101", periods=5)),
186187
"dt_tz": Series(date_range("20130101", periods=5, tz="US/Eastern")),
187188
"period": Series([Period("2000Q1")] * 5),
189+
"string": Series(["foo", "bar", "baz", "qux", "quux"], dtype="string"),
188190
}
189191

190192
mixed_dup_df = DataFrame(data)
@@ -233,6 +235,12 @@ def create_pickle_data():
233235
},
234236
index=range(5),
235237
),
238+
"string": DataFrame(
239+
{
240+
"A": Series(["foo", "bar", "baz", "qux", "quux"], dtype="string"),
241+
"B": Series(["one", "two", "one", "two", "three"], dtype="string"),
242+
}
243+
),
236244
}
237245

238246
cat = {

pyproject.toml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -526,6 +526,8 @@ filterwarnings = [
526526
"ignore:distutils Version classes are deprecated:DeprecationWarning:fsspec",
527527
# Can be removed once https://github.com/numpy/numpy/pull/24794 is merged
528528
"ignore:.*In the future `np.long` will be defined as.*:FutureWarning",
529+
# https://github.com/numpy/numpy/pull/29301
530+
"ignore:.*align should be passed:",
529531
]
530532
junit_family = "xunit2"
531533
markers = [

0 commit comments

Comments
 (0)