-
-
Notifications
You must be signed in to change notification settings - Fork 276
Open
Description
Is it possible to extend the unicode support to the word boundary anchor?
For example the russian sentence cannot be split:
"hello there this is a test".split(XRegExp('\\b', 'A'))
(11) ["hello", " ", "there", " ", "this", " ", "is", " ", "a", " ", "test"]
"Сняли не первый раз изначальную и конечную сумму и начальную не вернули !!!".split(XRegExp('\\b', 'A'))
["Сняли не первый раз изначальную и конечную сумму и начальную не вернули !!!"]
^ note the split has no effect on russian
The equivalent and desired behaviour in ruby, for example:
irb(main):001:0> "hello there this is a test".split(/\b/)
[
"hello",
" ",
"there",
" ",
"this",
" ",
"is",
" ",
"a",
" ",
"test"
]
irb(main):002:0> "Сняли не первый раз изначальную и конечную сумму и начальную не вернули !!!".split(/\b/)
[
"Сняли",
" ",
"не",
" ",
"первый",
" ",
"раз",
" ",
"изначальную",
" ",
"и",
" ",
"конечную",
" ",
"сумму",
" ",
"и",
" ",
"начальную",
" ",
"не",
" ",
"вернули",
" !!!"
]
ArturBaybulatov and DanAlexson90
Metadata
Metadata
Assignees
Labels
No labels