Is there a handy list of all characters considered (unicode) whitespace by NestedText? #56

AndydeCleyre · 2025-03-13T22:24:03Z

AndydeCleyre
Mar 13, 2025

I'm working on a parser, primarily from the language reference, and there are a handful of places where all unicode whitespace characters need to be treated a certain way. I'm not sure where to find a definitive list of these. Some of the lists I found seemed to not even include the tab character, but I'm pretty sure NestedText considers that in the category.

So is there a list of characters that fit the bill as far as NestedText is concerned?

Answered by KenKundert

Mar 14, 2025

If I recall correctly, there are basically three places where the language reference mentions white space.

it limits the indentation to consist only of ASCII space characters
it limits the tags to :, -, or > when followed by an ASCII space or a newline.
in inline lists and dictionaries the whitespace can be any non-line breaking whitespace character. So that includes the ASCII space and tab characters and the unicode whitespace characters, which are defined here

My python implementation uses Pythons str.strip() function to remove whitespace, and I am pretty sure it just removes anything that is marked as whitespace in the Unicode standard.

The essential ones are space and tab. I suspect…

View full answer

KenKundert · 2025-03-14T00:50:30Z

KenKundert
Mar 14, 2025
Maintainer

If I recall correctly, there are basically three places where the language reference mentions white space.

it limits the indentation to consist only of ASCII space characters
it limits the tags to :, -, or > when followed by an ASCII space or a newline.
in inline lists and dictionaries the whitespace can be any non-line breaking whitespace character. So that includes the ASCII space and tab characters and the unicode whitespace characters, which are defined here

My python implementation uses Pythons str.strip() function to remove whitespace, and I am pretty sure it just removes anything that is marked as whitespace in the Unicode standard.

The essential ones are space and tab. I suspect few people will notice if you do not support the Mongolian vowel separator.

1 reply

AndydeCleyre Mar 14, 2025
Author

Thanks! FWIW the items I was looking at:

dictionary item keys can't start with whitespace
dictionary item keys must have whitespace trimmed from the tail
inline dictionary items must ignore whitespace following the colon
inline strings (and inline dictionary-friendly strings) must trim whitespace from both sides

KenKundert · 2025-03-19T05:07:25Z

KenKundert
Mar 19, 2025
Maintainer

Thank you for the list.

I am in the process of trying to replace the official test suite. I have put together a proposed test suite and added tests to cover each of your cases. If you are interested, you can give this new test suite a try. I would be interested in your opinion. You can find the new test suite here:

https://github.com/KenKundert/proposed_nt_tests

2 replies

AndydeCleyre Apr 3, 2025
Author

Thanks! Sorry I've been working on a parser for an obscure language on and off, extremely slowly. I appreciate this but haven't gotten to a thorough testing stage yet. This will be very helpful when I get to it.

KenKundert Apr 5, 2025
Maintainer

I have moved the proposed tests to here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Is there a handy list of all characters considered (unicode) whitespace by NestedText? #56

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Is there a handy list of all characters considered (unicode) whitespace by NestedText? #56

Uh oh!

AndydeCleyre Mar 13, 2025

Replies: 2 comments · 3 replies

Uh oh!

KenKundert Mar 14, 2025 Maintainer

Uh oh!

AndydeCleyre Mar 14, 2025 Author

Uh oh!

KenKundert Mar 19, 2025 Maintainer

Uh oh!

AndydeCleyre Apr 3, 2025 Author

Uh oh!

KenKundert Apr 5, 2025 Maintainer

AndydeCleyre
Mar 13, 2025

Replies: 2 comments 3 replies

KenKundert
Mar 14, 2025
Maintainer

AndydeCleyre Mar 14, 2025
Author

KenKundert
Mar 19, 2025
Maintainer

AndydeCleyre Apr 3, 2025
Author

KenKundert Apr 5, 2025
Maintainer