multi-byte separator characters get split into their byte-components (unicode, UTF-8)

when you use a multi-byte separator character like "·" (middle dot, U+00B7), it get's split and only the first byte is used as separator character while the second byte is treated as part of the following column.

the middle dot in UTF-8 (which is a normal single character in unicode) is represented by the 2-byte sequence
\xC2 \xB7
and is displayed in NPP correctly, when the language is set to "normal text"

If you select CSVLint and manually enter the middle dot as column separator, NPP suddenly no longer displays the correct glyph but prints the binary replacement blocks for 'xC2' and 'xB7' instead.
when you look closely, you can see that the first byte \xC2 is displayed with neutral background for the separator char while the second byte \xB7 is displayed in the same color as the following column.

This leads me to believe that CSVLint only uses the first byte as separator and inserts the 'codes' for coloring between the two bytes, breaking them apart so NPP can no longer display them correctly



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

multi-byte separator characters get split into their byte-components (unicode, UTF-8) #76

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

multi-byte separator characters get split into their byte-components (unicode, UTF-8) #76

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions