Skip to content

Bug with Surrogates + Segments in the UTF8JsonGenerator #1473

@vitorpamplona

Description

@vitorpamplona

The UTF8JsonGenerator splits a string into segments without considering that it might cut the string exactly in between the high and low surrogate chars, which makes the generator escape surrogates instead of combining them when that feature is enabled.

All cases where the segment is split must check if the final character is not the beginning of a surrogate (_isStartOfSurrogatePair) and adjust the segment len based on it (-1).

int len = Math.min(_outputMaxContiguous, left);

Does this make sense?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions