Space in front of quotation doesn't recognise a field with multiple entries separated by common delimiter #941

Roberto-Circit · 2022-07-14T08:45:28Z

Using papaparse 5.3.0 version.

Having an empty space in front of quotation when using field such as address with multiple fields separated by common delimiter does not parse correctly
random name, 12345, "address line 1, address line 2, code"
results.data would have an item with 5 entries
would pick up as 5 separate entries, while

random name, 12345,"address line 1, address line 2, code"
results.data would have an item with 3 entries
would work normally.

This issue only occurs with double quoted fields with multiple entries in it.
Let me know if this isn't enough to go on or if this has been fixed in the new 6.0 version

The text was updated successfully, but these errors were encountered:

pokoli · 2022-07-14T11:35:48Z

I'm not sure to understand. Which separator and quote char are you using?
Which are the results on any of such fields?

fractalpixel · 2022-08-08T12:20:07Z

Can confirm this, ran into the exact same problem.

Example (first row has no problems, second row will be parsed as 4 columns):

"Cobra MK II","some wear, docking computer installed", 80 megacredits
"Milennium Falcon", "surface rust, light beam damage", 150 megacredits

The quotation character used is the default (double quotation mark ") and the separator is comma, but the problem does occur with other delimiters as well.

If the field starts with a space (before the quotation character), the quotation character is not recognized, and any delimiter characters inside the quoted value will be interpreted as delimiters. (The parsing code probably assumes that fields separated by delimiters will not have any preceding (or trailing) whitespace, and assumes that if the quote character doesn't immediately follow the delimiter, the field is not quoted).

CSV files with optional whitespace around delimiter characters and/or data values exist, it's common especially in hand-edited CSV files and CSV files where columns are aligned with spaces for readability (in addition to using a delimiter character).

Using the transform configuration option to remove starting and trailing whitespace doesn't help, as the transform is run after the quotes are processed.

Setting the field delimiter to ", " (comma followed by a space) breaks in cases where there is no space after the comma (or multiple spaces, if someone tried to manually align columns in addition to using a delimiter character).

Maybe add an option to automatically remove whitespace around unquoted values and outside quoted string values. Using that option would fix this problem, and is in any case something that needs to be done for CSV files where optional whitespace is present (using a transform function or when processing the data later) (e.g. the rhird column in the first row of my example is " 80 megacredits" when parsed, but the user probably wanted a result such as "80 megacredits" (with whitespace trimmed)). Some csv files might rely on storing whitespace around unquoted values, hence why this probably should be an option, but it could be on by default, as that seems to be the most common usecase. (Whitespace inside quoted strings should of course always be preserved).

pokoli · 2022-08-08T13:14:34Z

I'm not sure we should implement something to fix hand-edited files.
If someone edits a file and it breaks the format is normall that is not correctly parsed.

Roberto-Circit · 2022-08-15T11:31:20Z

Maybe add an option to automatically remove whitespace around unquoted values and outside quoted string values. Using that option would fix this problem, and is in any case something that needs to be done for CSV files where optional whitespace is present (using a transform function or when processing the data later) (e.g. the rhird column in the first row of my example is " 80 megacredits" when parsed, but the user probably wanted a result such as "80 megacredits" (with whitespace trimmed)). Some csv files might rely on storing whitespace around unquoted values, hence why this probably should be an option, but it could be on by default, as that seems to be the most common usecase. (Whitespace inside quoted strings should of course always be preserved).

Agreed, having this option would be nice

HarryPeach · 2023-03-24T15:48:47Z

Also running into this issue, would be good if the library could successfully parse csv files with spaces after commas, like alternatives do

janisdd · 2024-09-28T17:37:04Z

Also running into this issue, would be good if the library could successfully parse csv files with spaces after commas, like alternatives do

This is interesting, because it is not clear what to do in such cases. Papaparse simply makes a decision.

The problem with random name, 12345, "address line 1, address line 2, code" can be simplified to

a, b, "c"

The question is, should c be parsed to just c or c (with a leading space).

The quotes clearly indicate that there should be no leading space
the space after the , and " indicate that there should be a leading space (as in , b)

I think there is no right or wrong, you can go either way.
Papaparse chooses to ignore the quotes and treat them like a normal character rather than a special character.
(This is also the reason why the field in the example contains the leading space)

janisdd mentioned this issue Sep 28, 2024

Data is surrounded by Quotes #731

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Space in front of quotation doesn't recognise a field with multiple entries separated by common delimiter #941

Space in front of quotation doesn't recognise a field with multiple entries separated by common delimiter #941

Roberto-Circit commented Jul 14, 2022

pokoli commented Jul 14, 2022

fractalpixel commented Aug 8, 2022 •

edited

Loading

pokoli commented Aug 8, 2022

Roberto-Circit commented Aug 15, 2022

HarryPeach commented Mar 24, 2023

janisdd commented Sep 28, 2024

Space in front of quotation doesn't recognise a field with multiple entries separated by common delimiter #941

Space in front of quotation doesn't recognise a field with multiple entries separated by common delimiter #941

Comments

Roberto-Circit commented Jul 14, 2022

pokoli commented Jul 14, 2022

fractalpixel commented Aug 8, 2022 • edited Loading

pokoli commented Aug 8, 2022

Roberto-Circit commented Aug 15, 2022

HarryPeach commented Mar 24, 2023

janisdd commented Sep 28, 2024

fractalpixel commented Aug 8, 2022 •

edited

Loading