From 30e6c8d2e691a3a7bd676e36a3ff2c741b90cf47 Mon Sep 17 00:00:00 2001 From: Joris Van den Bossche Date: Mon, 7 Jul 2025 11:28:06 +0200 Subject: [PATCH 1/9] DOC: add section about upcoming pandas 3.0 changes (string dtype, CoW) to 2.3 whatsnew notes --- doc/source/whatsnew/v2.3.0.rst | 94 ++++++++++++++++++++++++++++++++++ 1 file changed, 94 insertions(+) diff --git a/doc/source/whatsnew/v2.3.0.rst b/doc/source/whatsnew/v2.3.0.rst index 8ca6c0006a604..3144474a9fe22 100644 --- a/doc/source/whatsnew/v2.3.0.rst +++ b/doc/source/whatsnew/v2.3.0.rst @@ -10,6 +10,100 @@ including other versions of pandas. .. --------------------------------------------------------------------------- +.. _whatsnew_220.upcoming_changes: + +Upcoming changes in pandas 3.0 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +pandas 3.0 will bring two bigger changes to the default behavior of pandas. + +Dedicated string data type by default +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Historically, pandas represented string columns with NumPy ``object`` data type. +This representation has numerous problems: it is not specific to strings (any +Python object can be stored in an ``object``-dtype array, not just strings) and +it is often not very efficient (both performance wise and for memory usage). + +Starting with the upcoming pandas 3.0 release, a dedicated string data type will +be enabled by default (backed by PyArrow under the hood, if installed, otherwise +falling back to NumPy). This means that pandas will start inferring columns +containing string data as the new ``str`` data type when creating pandas +objects, such as in constructors or IO functions. + +Old behavior: + +.. code-block:: python + >>> ser = pd.Series(["a", "b"]) + 0 a + 1 b + dtype: object +New behavior: + +.. code-block:: python + >>> ser = pd.Series(["a", "b"]) + 0 a + 1 b + dtype: str + +The string data type that is used in these scenarios will mostly behave as NumPy +object would, including missing value semantics and general operations on these +columns. + +However, the introduction of a new default dtype will also have some breaking +consequences your code (for example when checking for the ``.dtype`` being +object dtype). To allow testing it in advance of the pandas 3.0 release, this +future dtype inference logic can be enabled in pandas 2.3 with: + +.. code-block:: ipython + + pd.options.future.infer_string = True + +TODO add link to migration guide + +Copy-on-Write +^^^^^^^^^^^^^ + +The currently optional mode Copy-on-Write will be enabled by default in pandas 3.0. There +won't be an option to keep the current behavior enabled. + +In summary, the new "copy-on-write" behaviour will bring changes in behavior in +how pandas operates with respect to copies and views. + +1. The result of *any* indexing operation (subsetting a DataFrame or Series in any way, + i.e. including accessing a DataFrame column as a Series) or any method returning a + new DataFrame or Series, always *behaves as if* it were a copy in terms of user + API. +2. As a consequence, if you want to modify an object (DataFrame or Series), the only way + to do this is to directly modify that object itself. + +Because every single indexing step now behaves as a copy, this also means that +"chained assignment" (updating a DataFrame with multiple setitem steps) will +stop working. Because this now consistently never works, the +``SettingWithCopyWarning`` will be removed. + +The new behavioral semantics are explained in more detail in the +:ref:`user guide about Copy-on-Write `. + +The new behavior can be enabled since pandas 2.0 with the following option: + +.. code-block:: ipython + + pd.options.mode.copy_on_write = True + +Some of the behaviour changes allow a clear deprecation, like the changes in +chained assignment. Other changes are more subtle and thus, the warnings are +hidden behind an option that can be enabled since pandas 2.2: + +.. code-block:: ipython + + pd.options.mode.copy_on_write = "warn" + +This mode will warn in many different scenarios that aren't actually relevant to +most queries. We recommend exploring this mode, but it is not necessary to get rid +of all of these warnings. The :ref:`migration guide ` +explains the upgrade process in more detail. + .. _whatsnew_230.enhancements: Enhancements From d6cba0242b84a07e111d6a96e75becb66360bc03 Mon Sep 17 00:00:00 2001 From: Joris Van den Bossche Date: Mon, 7 Jul 2025 11:52:51 +0200 Subject: [PATCH 2/9] mode -> future --- doc/source/whatsnew/v2.3.1.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/source/whatsnew/v2.3.1.rst b/doc/source/whatsnew/v2.3.1.rst index eb3ad72f6a59f..7ad76e9d82c9c 100644 --- a/doc/source/whatsnew/v2.3.1.rst +++ b/doc/source/whatsnew/v2.3.1.rst @@ -44,7 +44,7 @@ correctly, rather than defaulting to ``object`` dtype. For example: .. code-block:: python - >>> pd.options.mode.infer_string = True + >>> pd.options.future.infer_string = True >>> df = pd.DataFrame() >>> df.columns.dtype dtype('int64') # default RangeIndex for empty columns From 2c07ace7f03590e8f170ea8e5649978eed79ae22 Mon Sep 17 00:00:00 2001 From: Joris Van den Bossche Date: Mon, 7 Jul 2025 12:06:22 +0200 Subject: [PATCH 3/9] Update doc/source/whatsnew/v2.3.0.rst Co-authored-by: Simon Hawkins --- doc/source/whatsnew/v2.3.0.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/source/whatsnew/v2.3.0.rst b/doc/source/whatsnew/v2.3.0.rst index 3144474a9fe22..4c14cad8aa328 100644 --- a/doc/source/whatsnew/v2.3.0.rst +++ b/doc/source/whatsnew/v2.3.0.rst @@ -10,7 +10,7 @@ including other versions of pandas. .. --------------------------------------------------------------------------- -.. _whatsnew_220.upcoming_changes: +.. _whatsnew_230.upcoming_changes: Upcoming changes in pandas 3.0 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ From 660eed4c360e84e03724c8f784d3baebe063c551 Mon Sep 17 00:00:00 2001 From: Joris Van den Bossche Date: Mon, 7 Jul 2025 12:06:36 +0200 Subject: [PATCH 4/9] Update doc/source/whatsnew/v2.3.0.rst Co-authored-by: Simon Hawkins --- doc/source/whatsnew/v2.3.0.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/source/whatsnew/v2.3.0.rst b/doc/source/whatsnew/v2.3.0.rst index 4c14cad8aa328..3700d4227a937 100644 --- a/doc/source/whatsnew/v2.3.0.rst +++ b/doc/source/whatsnew/v2.3.0.rst @@ -51,7 +51,7 @@ object would, including missing value semantics and general operations on these columns. However, the introduction of a new default dtype will also have some breaking -consequences your code (for example when checking for the ``.dtype`` being +consequences to your code (for example when checking for the ``.dtype`` being object dtype). To allow testing it in advance of the pandas 3.0 release, this future dtype inference logic can be enabled in pandas 2.3 with: From 2475c2e02c681f0933f487624f7097416b7e6af6 Mon Sep 17 00:00:00 2001 From: Joris Van den Bossche Date: Mon, 7 Jul 2025 12:07:26 +0200 Subject: [PATCH 5/9] fix whitespace --- doc/source/whatsnew/v2.3.0.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/doc/source/whatsnew/v2.3.0.rst b/doc/source/whatsnew/v2.3.0.rst index 3700d4227a937..17efeb5bd53bb 100644 --- a/doc/source/whatsnew/v2.3.0.rst +++ b/doc/source/whatsnew/v2.3.0.rst @@ -34,6 +34,7 @@ objects, such as in constructors or IO functions. Old behavior: .. code-block:: python + >>> ser = pd.Series(["a", "b"]) 0 a 1 b @@ -41,6 +42,7 @@ Old behavior: New behavior: .. code-block:: python + >>> ser = pd.Series(["a", "b"]) 0 a 1 b From d660590d35df52eeca954973237cfebdab6112ca Mon Sep 17 00:00:00 2001 From: Joris Van den Bossche Date: Mon, 7 Jul 2025 13:06:45 +0200 Subject: [PATCH 6/9] Update doc/source/whatsnew/v2.3.0.rst Co-authored-by: Simon Hawkins --- doc/source/whatsnew/v2.3.0.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/source/whatsnew/v2.3.0.rst b/doc/source/whatsnew/v2.3.0.rst index 17efeb5bd53bb..f31437f6e7f29 100644 --- a/doc/source/whatsnew/v2.3.0.rst +++ b/doc/source/whatsnew/v2.3.0.rst @@ -67,7 +67,7 @@ Copy-on-Write ^^^^^^^^^^^^^ The currently optional mode Copy-on-Write will be enabled by default in pandas 3.0. There -won't be an option to keep the current behavior enabled. +won't be an option to retain the legacy behavior. In summary, the new "copy-on-write" behaviour will bring changes in behavior in how pandas operates with respect to copies and views. From 5e9d0ccb6094b50f026b4ee4e86ae46cee28c4f3 Mon Sep 17 00:00:00 2001 From: Joris Van den Bossche Date: Mon, 7 Jul 2025 13:07:51 +0200 Subject: [PATCH 7/9] python -> ipython --- doc/source/whatsnew/v2.3.0.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/doc/source/whatsnew/v2.3.0.rst b/doc/source/whatsnew/v2.3.0.rst index f31437f6e7f29..cdcaf3e925faa 100644 --- a/doc/source/whatsnew/v2.3.0.rst +++ b/doc/source/whatsnew/v2.3.0.rst @@ -57,7 +57,7 @@ consequences to your code (for example when checking for the ``.dtype`` being object dtype). To allow testing it in advance of the pandas 3.0 release, this future dtype inference logic can be enabled in pandas 2.3 with: -.. code-block:: ipython +.. code-block:: python pd.options.future.infer_string = True @@ -89,7 +89,7 @@ The new behavioral semantics are explained in more detail in the The new behavior can be enabled since pandas 2.0 with the following option: -.. code-block:: ipython +.. code-block:: python pd.options.mode.copy_on_write = True @@ -97,7 +97,7 @@ Some of the behaviour changes allow a clear deprecation, like the changes in chained assignment. Other changes are more subtle and thus, the warnings are hidden behind an option that can be enabled since pandas 2.2: -.. code-block:: ipython +.. code-block:: python pd.options.mode.copy_on_write = "warn" From 6bd2f4d1f7ab089a20cc6717df317d24c5426127 Mon Sep 17 00:00:00 2001 From: Joris Van den Bossche Date: Mon, 7 Jul 2025 13:12:49 +0200 Subject: [PATCH 8/9] add link to string migration guide --- doc/source/whatsnew/v2.3.0.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/doc/source/whatsnew/v2.3.0.rst b/doc/source/whatsnew/v2.3.0.rst index cdcaf3e925faa..d4caed1036308 100644 --- a/doc/source/whatsnew/v2.3.0.rst +++ b/doc/source/whatsnew/v2.3.0.rst @@ -61,7 +61,8 @@ future dtype inference logic can be enabled in pandas 2.3 with: pd.options.future.infer_string = True -TODO add link to migration guide +See the :ref:`string_migration_guide` for more details on the behaviour changes +and how to adapt your code to the new default. Copy-on-Write ^^^^^^^^^^^^^ From 5bd71bc1262559d47feb49cc397fa466b65f711e Mon Sep 17 00:00:00 2001 From: Joris Van den Bossche Date: Mon, 7 Jul 2025 13:39:26 +0200 Subject: [PATCH 9/9] fixup --- doc/source/whatsnew/v2.3.0.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/doc/source/whatsnew/v2.3.0.rst b/doc/source/whatsnew/v2.3.0.rst index d4caed1036308..bf9b2ae2333c0 100644 --- a/doc/source/whatsnew/v2.3.0.rst +++ b/doc/source/whatsnew/v2.3.0.rst @@ -39,6 +39,7 @@ Old behavior: 0 a 1 b dtype: object + New behavior: .. code-block:: python