Skip to content

GiNZA >= 5.1 cannot process long (over 49149 bytes) texts #242

@TatsuyaShirakawa

Description

@TatsuyaShirakawa

This is essentially due to sudachi.rs's limitation but texts over 49149 bytes cannot be processed by GiNZA >= 5.1.

According to the sudachi.rs's code, the maximum text length (in bytes) is defined as u16::MAX / 4 * 3 (= 49149), so if a given text is longer than this size in bytes, sudachipy (sudachi.rs) raises an InputTooLong error.

Here is the related lines in the sudachi.rs's repo, which might help.

(I personally asked sudachi's developpers about this limitation and they gave me a feedback that the max length (u16::MAX / 4 * 3) is chosen for performance.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions