Skip to content

Investigate building -u-co-unihan collations without the und-u-co-private-unihan include #6918

@hsivonen

Description

@hsivonen

The und-u-co-private-unihan include is for the alphabetical index feature that ICU4X does not currently have and that might be broken for Han characters with the implicithan root anyway (see #2723 for why this might be).

Investigate building the -u-co-unihan collations without the und-u-co-private-unihan include. This would

  1. Make ko-u-co-unihan not have a tailoring trie at all (i.e. it would become a script reordering only).
  2. Make zh-u-co-unihan have tailoring trie items only in the "small" trie range if Investigate why collation tailorings for languages whose characters stay < U+1000 get highStart = 0xd800 #6856 is fixed, which should allow Han characters to take the above-highStart branch if zh-u-co-unihan is shipped in the "small" mode by default, which would probably have acceptable perf considering how rarely used zh-u-co-unihan is.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-dataArea: Data coverage or qualityC-collatorComponent: Collation, normalization

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions