Ranges in DerivedNames for Rugrep

The issue is there are several **ranges** in `DerivedName.txt`

```
➜ cat data/DerivedName.txt | grep '\.\.'                                                                  │
3400..4DBF    ; CJK UNIFIED IDEOGRAPH-*                                                                   │
4E00..9FFF    ; CJK UNIFIED IDEOGRAPH-*                                                                   │
F900..FA6D    ; CJK COMPATIBILITY IDEOGRAPH-*                                                             │
FA70..FAD9    ; CJK COMPATIBILITY IDEOGRAPH-*                                                             │
17000..187F7  ; TANGUT IDEOGRAPH-*                                                                        │
18B00..18CD5  ; KHITAN SMALL SCRIPT CHARACTER-*                                                           │
18D00..18D08  ; TANGUT IDEOGRAPH-*                                                                        │
1B170..1B2FB  ; NUSHU CHARACTER-*                                                                         │
20000..2A6DF  ; CJK UNIFIED IDEOGRAPH-*                                                                   │
2A700..2B739  ; CJK UNIFIED IDEOGRAPH-*                                                                   │
2B740..2B81D  ; CJK UNIFIED IDEOGRAPH-*                                                                   │
2B820..2CEA1  ; CJK UNIFIED IDEOGRAPH-*                                                                   │
2CEB0..2EBE0  ; CJK UNIFIED IDEOGRAPH-*                                                                   │
2EBF0..2EE5D  ; CJK UNIFIED IDEOGRAPH-*                                                                   │
2F800..2FA1D  ; CJK COMPATIBILITY IDEOGRAPH-*                                                             │
30000..3134A  ; CJK UNIFIED IDEOGRAPH-*                                                                   │
31350..323AF  ; CJK UNIFIED IDEOGRAPH-*    
```

actually this code was casting the hex code point to decimal code point

https://github.com/Acceis/unisec/blob/6ba37eaa22cefa1995dba8312d6cdbc4f1234904/lib/unisec/rugrep.rb#L41

which is ignoring ranges

```
irb(main):001:0> '2CEB0..2EBE0'.to_i(16)
=> 183984
irb(main):002:0> '2CEB0'.to_i(16)
=> 183984
```

So ranges are displayed as a single code point

```
➜ unisec grep '' | grep 'NUSHU'
U+16FE1 𖿡    NUSHU ITERATION MARK
U+1B170 𛅰    NUSHU CHARACTER-*
```

Solutions :

1. Parse this better to display ranges with a horizontal ellipsis
    - Pros: keep one command
    - Cons: add code complexity, output is inconsistent (bad for piping to other commands)
2. Add a sub-command named `ranges`
    - Pros: keep consistent output for the grep command
    - Cons: split in several commands
3.  Pad range end to the name, eg. `U+1B170 𛅰    NUSHU CHARACTER-* (up to U+1B2FB)`
    - Pros: keep on command, code point column is consistent
    - Cons: name column becomes unreliable (information appended to the name)
4. Expending the name dynamically
    - Pros: no inconsistency, no unreliable column
    - Cons: for matching result the output will be quite large for not so much value and become unreadable
6. Adding a third field for comments
    - New behavior just for a few exceptions

Eg. of name expansion for idea n°4 http://www.unicode.org/charts/beta/nameslist/n_F900.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Ranges in DerivedNames for Rugrep #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Ranges in DerivedNames for Rugrep #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions