Skip to content

Review icu_properties and icu_casemap tries for perf-sensitive lookup by characters in the U+1000 to U+FFFF range #6885

@hsivonen

Description

@hsivonen

We should look at the access patterns of the tries in icu_properties and icu_casemap in applications when processing input with characters in the U+1000 to U+FFFF range (such as Chinese and Georgian) to see which tries end up on performance-sensitive lookup path with such characters.

For example, perhaps we'll find that JoiningType gets queried only if a character has already been determined to be RTL and, therefore, the JoiningType should always use the small trie type.

For example, perhaps we'll find that casemap tries get queried even for uncased characters putting the tries on the hot path even for Chinese despite Han characters not having case and, therefore, the tries should be promoted to the fast trie type by default or at least by some flag that does not also promote tries that have been determined to make sense to always keep small.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-dataArea: Data coverage or qualityA-performanceArea: Performance (CPU, Memory)C-unicodeComponent: Props, sets, tries

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions