Segmenter results for thai sentence seems incorrect. #3208
Unanswered
riajain0412
asked this question in
Q&A
Replies: 2 comments 8 replies
-
The breakpoints are in terms of UTF-8 indices. |
Beta Was this translation helpful? Give feedback.
8 replies
-
Okay. Thank You for your help. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I ran the below code to check the breakpoint for thai sentence:
And it gave the result this: 0 9 21 39 51 60 66.
However, the above thai sentence only have 17-18 characters so howcome ICU4X segmenter giving 39,51,60 etc as breakpoints?
Is this an expected resulted? If yes, then how should I take these indices as?
Beta Was this translation helpful? Give feedback.
All reactions