Add unicode symbols parser #213

necessarily-equal · 2025-02-07T06:40:41Z

Hey :)
I added a parser to match characters from the unicode symbols category. Here is the PR

I'm using it for my own parser, but I could see it used to parse some Haskell code for example (ctrl-F symbol in that document). The haskell definition of a symbol is a bit different for ascii characters, but it would still be possible to construct it:

haskell_symbol_def = (bp::symb | bp::punct) - (bp::char_('"') | '\'' | '(' | ')' | ',' | ';' | '[' | ']' | '_' | '`' | '{' | '}');

Here in this PR I've updated the documentation manually. Feel free to regenerate it properly to ensure correctness. You may also want to reindent the list of characters if you care about aesthetics enough.

necessarily-equal · 2025-02-07T06:41:57Z

About the naming of the new parser: I've used "symb" to avoid conflicting with "symbols" (lookup tables). I think that's reasonable since "punct" is also abbreviated.

tzlaine · 2025-02-16T19:24:06Z

This look really useful, and the code looks good as well. A couple of nits: 1) please don't do drive-by fixes like the typo correction in the same commit as a semantic change, and 2) please don't touch parser_reference.xml at all, as it is generated.

A bigger concern, though, is that there are no tests. If you can update the character set test with this new charset (and please include a test that char_set<symb_chars>::chars is sorted), and address the nits above, this will be good to merge.

necessarily-equal · 2025-02-17T12:40:17Z

please don't do drive-by fixes like the typo correction in the same commit as a semantic change

I think this is already correct

please don't touch parser_reference.xml at all, as it is generated.

Okay, removed that change

A bigger concern, though, is that there are no tests. If you can update the character set test with this new charset

Done, hopefully that is enough

(and please include a test that char_set<symb_chars>::chars is sorted), and address the nits above, this will be good to merge.

Added as well, this one is new, I didn't exist yet when I wrote the code :D

This should be good to go. Not sure why the drone CI did fail with a segfault last time. Since at the time I didn't have any entry point for symb I assume there is a problem somewhere else

tzlaine · 2025-02-21T05:54:22Z

please don't do drive-by fixes like the typo correction in the same commit as a semantic change

I think this is already correct

I went ahead and merged this PR, but I think you missed my point. The comment/doc spelling fix "clases" -> "classes" is definitely needed, but it should have gone in a separate commit. What if I discovered something wrong with this PR and had to revert it? I'd then also have to remember to go back and fix "clases" again.

Anyway, thanks for the contribution! It's much appreciated.

Add symb parser to handle unicode symbols

58bb956

necessarily-equal force-pushed the add-unicode-symbols-parser branch from cfea6db to d6d9844 Compare February 7, 2025 06:51

Antoine Fontaine added 3 commits February 17, 2025 13:33

Add documentation for symb

d260b42

Add tests for symb

9069a8d

Fix typo in the documentation

4183274

necessarily-equal force-pushed the add-unicode-symbols-parser branch from d6d9844 to 4183274 Compare February 17, 2025 12:37

tzlaine merged commit b253d9c into boostorg:develop Feb 21, 2025
22 of 23 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add unicode symbols parser #213

Add unicode symbols parser #213

necessarily-equal commented Feb 7, 2025 •

edited

Loading

Uh oh!

necessarily-equal commented Feb 7, 2025

Uh oh!

tzlaine commented Feb 16, 2025

Uh oh!

necessarily-equal commented Feb 17, 2025

Uh oh!

Uh oh!

tzlaine commented Feb 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add unicode symbols parser #213

Add unicode symbols parser #213

Conversation

necessarily-equal commented Feb 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

necessarily-equal commented Feb 7, 2025

Uh oh!

tzlaine commented Feb 16, 2025

Uh oh!

necessarily-equal commented Feb 17, 2025

Uh oh!

Uh oh!

tzlaine commented Feb 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

necessarily-equal commented Feb 7, 2025 •

edited

Loading