Skip to content

String API additions and SIMD optimizations #129

@mosra

Description

@mosra

A meta-issue tracking various ideas for SIMD-optimized string algorithms. A reason why we're making our own string APIs is because the C string library is made for null-terminated strings, which is quite useless when the major use case is working on slices of larger strings, such as in parsers. And (of course) the C++ counterparts are too bloated and either impose an implicit allocation or require a too new C++ standard.

SIMD in general

Construction

Searching

Comparison

  • Check pointer equality before calling into memcmp() in StringView::operator==(), could save a lot especially when comparing literals that the compiler might have deduplicated
    • Don't do that in String tho
  • Would we gain anything by implementing memcmp() ourselves?
    • especially for SSO strings that have a fixed size, which could be a single (masked) instruction?
    • by not having to explicitly test for nullptr when size == 0 just to not hit an UB because the standard is stupid and generally disallows passing nullptr to any string/memory function even if the size is zero?
  • Case-insensitive comparison -- http://www.phoronix.com/scan.php?page=news_item&px=Glibc-strcasecmp-AVX2-EVEX
    • Possibly useful for extension comparison in Any* plugins, OTOH there it's probably faster to normalize the extension first and then do 100 memcmp()s

Unicode

Number-to-string

For Utility::Debug, Utility::format() etc. The core should be a direct overhead-less API working on builtin types (writing into a statically-sized char[], e.g.), with convenience wrappers above.

String-to-number

Because strto*() has insane usability issues.

General printing

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    Status

    TODO

    Status

    TODO

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions