Skip to content

add nulls parse option in readExcel #1247

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

AndreiKingsley
Copy link
Collaborator

Fixes #1166.

@@ -255,6 +271,8 @@ public fun DataFrame.Companion.readExcel(
* when set to false, it operates as [NameRepairStrategy.MAKE_UNIQUE],
* ensuring unique column names will make the columns be named according to excel columns, like "A", "B", "C" etc.
* for unstructured data.
* @param parseEmptyAsNull when set to true, empty strings in cells are parsed as null (default true).
Copy link
Collaborator

@Jolanrensen Jolanrensen Jun 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There may be more representations in strings that can refer to null. maybe it might be better to follow readCsv and other parsing options for this. It will be overkill to add the full ParserOptions as argument (since datetime and locales are already handled by excel), but csv, for instance uses nullStrings: Set<String>= setOf("", "NA", "N/A", "null", "NULL", "None", "none", "NIL", "nil"), so it might make sense to add a nullStrings argument here as well, probably with fewer defaults

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created an issue:
#1248

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think is a good idea to introduce a new non-trivial logic just before release, need more time for implementation and testing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay sure, makes sense

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

speaking off, this is a binary-breaking change. Since we released a beta already, we cannot break the API like that without a fallback mechanism. Could you add the old function signatures back with @Deprecated() at level HIDDEN? That way, we won't break libraries calling the old function :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect XLSX parsing
2 participants