Skip to content

Commit 4048175

Browse files
committed
grammar/case_rule: extend the testcase to check non-ASCII tokens
The logic of case/match lexing rules may be complex when working on source buffers encoded using varying length charsets such as UTF-8. Extend this testcase so that the "backwards codepoint lookup" behavior is exercised with a multi-bytes codepoint.
1 parent dbad162 commit 4048175

File tree

3 files changed

+11
-1
lines changed

3 files changed

+11
-1
lines changed

testsuite/tests/grammar/case_rule/expected_concrete_syntax.lkt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ lexer foo_lexer {
22

33
char
44
dot <- "."
5-
id <- p"[a-zA-Z]+"
5+
id <- p"[a-zA-Zé🙂]+"
66
tick <- "'"
77
newline <- p"\n"
88

testsuite/tests/grammar/case_rule/main.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
('simple-attr', "a'b"),
1111
('char-dot', "'a'.b"),
1212
('id-char', "a'b'"),
13+
('unicode-id-char', "\xe9'\U0001f642'"),
1314
):
1415
print('== {} =='.format(label))
1516
u = ctx.get_from_buffer('{}.txt'.format(label), text)

testsuite/tests/grammar/case_rule/test.out

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,5 +24,14 @@ main.py: Running...
2424
<Token Tick "'" at 1:4-1:5>
2525
<Token Termination at 1:5-1:5>
2626

27+
== unicode-id-char ==
28+
1:5-1:5: Expected Id, got Termination
29+
--
30+
<Token Id 'é' at 1:1-1:2>
31+
<Token Tick "'" at 1:2-1:3>
32+
<Token Id '🙂' at 1:3-1:4>
33+
<Token Tick "'" at 1:4-1:5>
34+
<Token Termination at 1:5-1:5>
35+
2736
main.py: Done.
2837
Done

0 commit comments

Comments
 (0)