Skip to content

Conversation

@necessarily-equal
Copy link
Contributor

@necessarily-equal necessarily-equal commented Feb 7, 2025

Hey :)
I added a parser to match characters from the unicode symbols category. Here is the PR

I'm using it for my own parser, but I could see it used to parse some Haskell code for example (ctrl-F symbol in that document). The haskell definition of a symbol is a bit different for ascii characters, but it would still be possible to construct it:

haskell_symbol_def = (bp::symb | bp::punct) - (bp::char_('"') | '\'' | '(' | ')' | ',' | ';' | '[' | ']' | '_' | '`' | '{' | '}');

Here in this PR I've updated the documentation manually. Feel free to regenerate it properly to ensure correctness. You may also want to reindent the list of characters if you care about aesthetics enough.

@necessarily-equal
Copy link
Contributor Author

About the naming of the new parser: I've used "symb" to avoid conflicting with "symbols" (lookup tables). I think that's reasonable since "punct" is also abbreviated.

@necessarily-equal necessarily-equal force-pushed the add-unicode-symbols-parser branch from cfea6db to d6d9844 Compare February 7, 2025 06:51
@tzlaine
Copy link
Collaborator

tzlaine commented Feb 16, 2025

This look really useful, and the code looks good as well. A couple of nits: 1) please don't do drive-by fixes like the typo correction in the same commit as a semantic change, and 2) please don't touch parser_reference.xml at all, as it is generated.

A bigger concern, though, is that there are no tests. If you can update the character set test with this new charset (and please include a test that char_set<symb_chars>::chars is sorted), and address the nits above, this will be good to merge.

@necessarily-equal necessarily-equal force-pushed the add-unicode-symbols-parser branch from d6d9844 to 4183274 Compare February 17, 2025 12:37
@necessarily-equal
Copy link
Contributor Author

  1. please don't do drive-by fixes like the typo correction in the same commit as a semantic change

I think this is already correct

  1. please don't touch parser_reference.xml at all, as it is generated.

Okay, removed that change

A bigger concern, though, is that there are no tests. If you can update the character set test with this new charset

Done, hopefully that is enough

(and please include a test that char_set<symb_chars>::chars is sorted), and address the nits above, this will be good to merge.

Added as well, this one is new, I didn't exist yet when I wrote the code :D

This should be good to go. Not sure why the drone CI did fail with a segfault last time. Since at the time I didn't have any entry point for symb I assume there is a problem somewhere else

@tzlaine tzlaine merged commit b253d9c into boostorg:develop Feb 21, 2025
22 of 23 checks passed
@tzlaine
Copy link
Collaborator

tzlaine commented Feb 21, 2025

  1. please don't do drive-by fixes like the typo correction in the same commit as a semantic change

I think this is already correct

I went ahead and merged this PR, but I think you missed my point. The comment/doc spelling fix "clases" -> "classes" is definitely needed, but it should have gone in a separate commit. What if I discovered something wrong with this PR and had to revert it? I'd then also have to remember to go back and fix "clases" again.

Anyway, thanks for the contribution! It's much appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants