forked from Acceis/unisec
-
-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
The issue is there are several ranges in DerivedName.txt
➜ cat data/DerivedName.txt | grep '\.\.' │
3400..4DBF ; CJK UNIFIED IDEOGRAPH-* │
4E00..9FFF ; CJK UNIFIED IDEOGRAPH-* │
F900..FA6D ; CJK COMPATIBILITY IDEOGRAPH-* │
FA70..FAD9 ; CJK COMPATIBILITY IDEOGRAPH-* │
17000..187F7 ; TANGUT IDEOGRAPH-* │
18B00..18CD5 ; KHITAN SMALL SCRIPT CHARACTER-* │
18D00..18D08 ; TANGUT IDEOGRAPH-* │
1B170..1B2FB ; NUSHU CHARACTER-* │
20000..2A6DF ; CJK UNIFIED IDEOGRAPH-* │
2A700..2B739 ; CJK UNIFIED IDEOGRAPH-* │
2B740..2B81D ; CJK UNIFIED IDEOGRAPH-* │
2B820..2CEA1 ; CJK UNIFIED IDEOGRAPH-* │
2CEB0..2EBE0 ; CJK UNIFIED IDEOGRAPH-* │
2EBF0..2EE5D ; CJK UNIFIED IDEOGRAPH-* │
2F800..2FA1D ; CJK COMPATIBILITY IDEOGRAPH-* │
30000..3134A ; CJK UNIFIED IDEOGRAPH-* │
31350..323AF ; CJK UNIFIED IDEOGRAPH-*
actually this code was casting the hex code point to decimal code point
which is ignoring ranges
irb(main):001:0> '2CEB0..2EBE0'.to_i(16)
=> 183984
irb(main):002:0> '2CEB0'.to_i(16)
=> 183984
So ranges are displayed as a single code point
➜ unisec grep '' | grep 'NUSHU'
U+16FE1 𖿡 NUSHU ITERATION MARK
U+1B170 𛅰 NUSHU CHARACTER-*
Solutions :
- Parse this better to display ranges with a horizontal ellipsis
- Pros: keep one command
- Cons: add code complexity, output is inconsistent (bad for piping to other commands)
- Add a sub-command named
ranges
- Pros: keep consistent output for the grep command
- Cons: split in several commands
- Pad range end to the name, eg.
U+1B170 𛅰 NUSHU CHARACTER-* (up to U+1B2FB)
- Pros: keep on command, code point column is consistent
- Cons: name column becomes unreliable (information appended to the name)
- Expending the name dynamically
- Pros: no inconsistency, no unreliable column
- Cons: for matching result the output will be quite large for not so much value and become unreadable
- Adding a third field for comments
- New behavior just for a few exceptions
Eg. of name expansion for idea n°4 http://www.unicode.org/charts/beta/nameslist/n_F900.html
Metadata
Metadata
Assignees
Labels
No labels