Skip to content

Deprecations and FutureWarnings Fixes #128

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Jun 13, 2024
Merged

Deprecations and FutureWarnings Fixes #128

merged 14 commits into from
Jun 13, 2024

Conversation

fsoubelet
Copy link
Member

@fsoubelet fsoubelet commented Jan 7, 2024

This PR brings changes to address the two deprecations / warnings which aren't very far away from being actual issues. I have bumped the package version as a patch, but this can be discussed.

Deprecated use of .applymap

The call to .applymap in tfs.writer._quote_string_columns has been changed to .map (a total equivalent) since .applymap is deprecated and will be removed in (most likely) the next major pandas version.

  • Following this, the minimum required version of pandas in the package has been upped to 2.1 which is where .map appeared (and .applymap was deprecated).
  • Accordingly, since Python 3.8 is not supported by pandas 2.1 I have increased tfs's minimum required Python version to 3.9. Considering 3.8 no longer gets active support and will stop receiving security updates within the second half of the year, I don't think this is too extreme. More thoughts on the timeline below.

Deprecated use of delim_whitespace in pd.read_csv when reading

The delim_whitespace option is now deprecated. A simple fix was to use sep="\s+" as suggested in the issued warning. The option has been around for a long time and comes with no version / timeline considerations.

FutureWarning from pd.option_context('mode.use_inf_as_na', True)

The use of the pd.option_context('mode.use_inf_as_na', True) context manager in tfs.frame.validate raises a FutureWarning which recommends to convert instead. It's not clear from the warning when this functionality will stop being supported.

To circumvent this, the .isna() call is now made after calling .replace([np.inf, -np.inf], np.nan). Since .replace returns a copy we are not modifying the provided dataframe. Comments have been added above the relevant code for developpers.

According to the accepted answer in this StackOverflow thread, the performance impact should be quite small: about 0.1s on a dataframe with 1M rows (and most likely faster on newer Python versions since the .replace source code uses a lot of native Python). Could be compared with the runtime impact of the current with pd.option_context('mode.use_inf_as_na', True) call, through the impact seems small enought that I did not benchmark.

FutureWarning from .replace([np.inf, -np.inf], np.nan)

The above change now also raises a future warning about object downcasting during the operation. A fix is done by first calling .infer_dtypes() on the dataframe to infer types where possible as suggested in the issued warning. This call creates a lazy copy (for copy-on-write behaviour that will come in pandas 3.0 but the following call to .replace() creates a hard copy which erases all concerns about changing the original dataframe during validation. The .infer_dtypes() function has been around for a long time and comes with no version / timeline considerations.

Suggested Timeline

I suggest we leave this PR as a draft for now. According to the pandas repo, pandas 3.0 is due for April 1st, 2024 and we will most likely encounter issues with the current tfs-pandas code at this time.

By then, Python 3.8 support will be 6 months away from complete end of life and I believe it will be reasonable to simply drop support for it ourselves (which merging this PR and publishing will do).

In short: if we want to be conservative with our supported Python versions, we should keep this PR open until just before pandas 3.0 comes around (~ April), and merge + publish a new version of tfs-pandas then. Otherwise we can merge and publish sooner.

Note on CI

The CI is currently messed up since we have Python 3.8 in our tests matrix and the package, in this branch, can't install on this version (since we ask for pandas >= 2.1 which does not exist for Python 3.8).

If we're waiting until close to April to merge and drop Python 3.8 support, we could at the same time remove it from our CI matrices.

@fsoubelet fsoubelet added Type: Feature A (suggetion for a) new feature or enhancement in functionality. To be discussed labels Jan 7, 2024
@fsoubelet fsoubelet requested a review from JoschD January 7, 2024 13:37
@fsoubelet fsoubelet self-assigned this Jan 7, 2024
@JoschD
Copy link
Member

JoschD commented Jan 15, 2024

To consider: maybe increase minor version instead of patch? I don't like when a patch version breaks the backwards compatibility with python/pandas packages like this.

@fsoubelet
Copy link
Member Author

Fully agreed.

@fsoubelet
Copy link
Member Author

@JoschD I suggest we drop 3.8 from our CI matrices as we did for omc3 and go forward with merging this PR and releasing a new version of the package.

The minimum required pandas version has been out for close to a year and I expect no friction with dependencies resolving.

@fsoubelet fsoubelet changed the title [WIP] Deprecations and FutureWarnings Fixes Deprecations and FutureWarnings Fixes Jun 12, 2024
@fsoubelet fsoubelet marked this pull request as ready for review June 12, 2024 14:09
@fsoubelet fsoubelet merged commit da9abd7 into master Jun 13, 2024
21 checks passed
@fsoubelet fsoubelet deleted the deprecations branch June 13, 2024 09:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Feature A (suggetion for a) new feature or enhancement in functionality.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants