Skip to content

Use Narwhals, become a data frame agnostic package. #448

@christiansegercrantz

Description

@christiansegercrantz

Motivation: describe the problem to be solved
Currently, NannyML is built to rely on Pandas data frames and numpy arrays. Pandas is still the majority used data frame framework, but with new ones, such as Polars and Arrow, becoming more prevalent it would be foolish not to future proof ones own work by incorporating them. By using Narwhals, the devs can completely delegate this work to the 3rd party library while getting more support then just the previously mentioned. This has the additional benefit of code speedup possibly.

Describe the solution you'd like
NannyML is data frame and array agnostic library. This would make NannyML more robust to changes, more efficient (narwhals is faster than native pandas) and versatile.

Describe alternatives you've considered
The alternatives are continue to use Pandas only or implement more support oneselves.

  1. Continuing using pandas makes the package less versatile and slower, but requires no extra work contrary to the opposite.
  2. Implementing the support oneselves is both more work and is likely to not become better than what Narwhals provide.

Additional context
Personally, I think the narwhals API (which is very similar to that of Polars) is far superior to pandas and makes for much cleaner code too. I highly recommend considering this change. If this is something you are interested to look into, it's probably possible to ping Marco Gorelli to discuss any outstanding questions.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions