-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
DOC: add section about upcoming pandas 3.0 changes (string dtype, CoW) to 2.3 whatsnew notes #61795
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
jorisvandenbossche
merged 10 commits into
pandas-dev:main
from
jorisvandenbossche:doc-whatsnew-2.3.0-preview-note
Jul 7, 2025
+99
−1
Merged
Changes from 1 commit
Commits
Show all changes
10 commits
Select commit
Hold shift + click to select a range
30e6c8d
DOC: add section about upcoming pandas 3.0 changes (string dtype, CoW…
jorisvandenbossche d6cba02
mode -> future
jorisvandenbossche 2c07ace
Update doc/source/whatsnew/v2.3.0.rst
jorisvandenbossche 660eed4
Update doc/source/whatsnew/v2.3.0.rst
jorisvandenbossche 2475c2e
fix whitespace
jorisvandenbossche d660590
Update doc/source/whatsnew/v2.3.0.rst
jorisvandenbossche 5e9d0cc
python -> ipython
jorisvandenbossche 432e241
Merge remote-tracking branch 'upstream/main' into doc-whatsnew-2.3.0-…
jorisvandenbossche 6bd2f4d
add link to string migration guide
jorisvandenbossche 5bd71bc
fixup
jorisvandenbossche File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -10,6 +10,100 @@ including other versions of pandas. | |
|
||
.. --------------------------------------------------------------------------- | ||
|
||
.. _whatsnew_220.upcoming_changes: | ||
|
||
Upcoming changes in pandas 3.0 | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
pandas 3.0 will bring two bigger changes to the default behavior of pandas. | ||
|
||
Dedicated string data type by default | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
Historically, pandas represented string columns with NumPy ``object`` data type. | ||
This representation has numerous problems: it is not specific to strings (any | ||
Python object can be stored in an ``object``-dtype array, not just strings) and | ||
it is often not very efficient (both performance wise and for memory usage). | ||
|
||
Starting with the upcoming pandas 3.0 release, a dedicated string data type will | ||
be enabled by default (backed by PyArrow under the hood, if installed, otherwise | ||
falling back to NumPy). This means that pandas will start inferring columns | ||
containing string data as the new ``str`` data type when creating pandas | ||
objects, such as in constructors or IO functions. | ||
|
||
Old behavior: | ||
|
||
.. code-block:: python | ||
>>> ser = pd.Series(["a", "b"]) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. does this need another line to actually output the repr or just change these to not include |
||
0 a | ||
1 b | ||
dtype: object | ||
New behavior: | ||
|
||
.. code-block:: python | ||
>>> ser = pd.Series(["a", "b"]) | ||
0 a | ||
1 b | ||
dtype: str | ||
|
||
The string data type that is used in these scenarios will mostly behave as NumPy | ||
object would, including missing value semantics and general operations on these | ||
columns. | ||
|
||
However, the introduction of a new default dtype will also have some breaking | ||
consequences your code (for example when checking for the ``.dtype`` being | ||
jorisvandenbossche marked this conversation as resolved.
Show resolved
Hide resolved
|
||
object dtype). To allow testing it in advance of the pandas 3.0 release, this | ||
future dtype inference logic can be enabled in pandas 2.3 with: | ||
|
||
.. code-block:: ipython | ||
|
||
pd.options.future.infer_string = True | ||
|
||
TODO add link to migration guide | ||
|
||
Copy-on-Write | ||
^^^^^^^^^^^^^ | ||
|
||
The currently optional mode Copy-on-Write will be enabled by default in pandas 3.0. There | ||
won't be an option to keep the current behavior enabled. | ||
jorisvandenbossche marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
In summary, the new "copy-on-write" behaviour will bring changes in behavior in | ||
how pandas operates with respect to copies and views. | ||
|
||
1. The result of *any* indexing operation (subsetting a DataFrame or Series in any way, | ||
i.e. including accessing a DataFrame column as a Series) or any method returning a | ||
new DataFrame or Series, always *behaves as if* it were a copy in terms of user | ||
API. | ||
2. As a consequence, if you want to modify an object (DataFrame or Series), the only way | ||
to do this is to directly modify that object itself. | ||
|
||
Because every single indexing step now behaves as a copy, this also means that | ||
"chained assignment" (updating a DataFrame with multiple setitem steps) will | ||
stop working. Because this now consistently never works, the | ||
``SettingWithCopyWarning`` will be removed. | ||
|
||
The new behavioral semantics are explained in more detail in the | ||
:ref:`user guide about Copy-on-Write <copy_on_write>`. | ||
|
||
The new behavior can be enabled since pandas 2.0 with the following option: | ||
|
||
.. code-block:: ipython | ||
|
||
pd.options.mode.copy_on_write = True | ||
|
||
Some of the behaviour changes allow a clear deprecation, like the changes in | ||
chained assignment. Other changes are more subtle and thus, the warnings are | ||
hidden behind an option that can be enabled since pandas 2.2: | ||
|
||
.. code-block:: ipython | ||
|
||
pd.options.mode.copy_on_write = "warn" | ||
|
||
This mode will warn in many different scenarios that aren't actually relevant to | ||
most queries. We recommend exploring this mode, but it is not necessary to get rid | ||
of all of these warnings. The :ref:`migration guide <copy_on_write.migration_guide>` | ||
explains the upgrade process in more detail. | ||
|
||
.. _whatsnew_230.enhancements: | ||
|
||
Enhancements | ||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.