Is there a handy list of all characters considered (unicode) whitespace by NestedText? #56
-
I'm working on a parser, primarily from the language reference, and there are a handful of places where all unicode whitespace characters need to be treated a certain way. I'm not sure where to find a definitive list of these. Some of the lists I found seemed to not even include the tab character, but I'm pretty sure NestedText considers that in the category. So is there a list of characters that fit the bill as far as NestedText is concerned? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
If I recall correctly, there are basically three places where the language reference mentions white space.
My python implementation uses Pythons str.strip() function to remove whitespace, and I am pretty sure it just removes anything that is marked as whitespace in the Unicode standard. The essential ones are space and tab. I suspect few people will notice if you do not support the Mongolian vowel separator. |
Beta Was this translation helpful? Give feedback.
-
Thank you for the list. I am in the process of trying to replace the official test suite. I have put together a proposed test suite and added tests to cover each of your cases. If you are interested, you can give this new test suite a try. I would be interested in your opinion. You can find the new test suite here: |
Beta Was this translation helpful? Give feedback.
If I recall correctly, there are basically three places where the language reference mentions white space.
:
,-
, or>
when followed by an ASCII space or a newline.My python implementation uses Pythons str.strip() function to remove whitespace, and I am pretty sure it just removes anything that is marked as whitespace in the Unicode standard.
The essential ones are space and tab. I suspect…