-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Space in front of quotation doesn't recognise a field with multiple entries separated by common delimiter #941
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'm not sure to understand. Which separator and quote char are you using? |
Can confirm this, ran into the exact same problem. Example (first row has no problems, second row will be parsed as 4 columns):
The quotation character used is the default (double quotation mark ") and the separator is comma, but the problem does occur with other delimiters as well. If the field starts with a space (before the quotation character), the quotation character is not recognized, and any delimiter characters inside the quoted value will be interpreted as delimiters. (The parsing code probably assumes that fields separated by delimiters will not have any preceding (or trailing) whitespace, and assumes that if the quote character doesn't immediately follow the delimiter, the field is not quoted). CSV files with optional whitespace around delimiter characters and/or data values exist, it's common especially in hand-edited CSV files and CSV files where columns are aligned with spaces for readability (in addition to using a delimiter character). Using the transform configuration option to remove starting and trailing whitespace doesn't help, as the transform is run after the quotes are processed. Setting the field delimiter to ", " (comma followed by a space) breaks in cases where there is no space after the comma (or multiple spaces, if someone tried to manually align columns in addition to using a delimiter character). Maybe add an option to automatically remove whitespace around unquoted values and outside quoted string values. Using that option would fix this problem, and is in any case something that needs to be done for CSV files where optional whitespace is present (using a transform function or when processing the data later) (e.g. the rhird column in the first row of my example is " 80 megacredits" when parsed, but the user probably wanted a result such as "80 megacredits" (with whitespace trimmed)). Some csv files might rely on storing whitespace around unquoted values, hence why this probably should be an option, but it could be on by default, as that seems to be the most common usecase. (Whitespace inside quoted strings should of course always be preserved). |
I'm not sure we should implement something to fix hand-edited files. |
Agreed, having this option would be nice |
Also running into this issue, would be good if the library could successfully parse csv files with spaces after commas, like alternatives do |
This is interesting, because it is not clear what to do in such cases. Papaparse simply makes a decision. The problem with
The question is, should
I think there is no right or wrong, you can go either way. |
Using papaparse 5.3.0 version.
Having an empty space in front of quotation when using field such as address with multiple fields separated by common delimiter does not parse correctly
random name, 12345, "address line 1, address line 2, code"
results.data would have an item with 5 entries
would pick up as 5 separate entries, while
random name, 12345,"address line 1, address line 2, code"
results.data would have an item with 3 entries
would work normally.
This issue only occurs with double quoted fields with multiple entries in it.
Let me know if this isn't enough to go on or if this has been fixed in the new 6.0 version
The text was updated successfully, but these errors were encountered: