Skip to content

MXParser fails to correctly parse the ampersand after the character from the #x10000-#x10FFFD Unicode range #7

@aliaksei-burlakou

Description

@aliaksei-burlakou

Expected Behavior

XML document with the encoded Unicode characters from the #x10000-#x10FFFD Unicode range (like 𐀀 or 􏰍) should be parsed by MXParser without any issues with these characters or any other valid characters, regardless of their location in the document.

Actual Behavior

MXParser erroneously appends a replacement character (�) after the ampersand during parsing if the XML document contains a character from the #x10000-#x10FFFD Unicode range somewhere before the ampersand in the XML.

Steps to reproduce

  • Java 17 (Amazon Corretto JDK, build 17.0.11+9-LTS), XStream v1.4.21
  • The 􏰍 encoded character (􏰍, U+10FC0D, HEX: F4 8F B0 8D) should present somewhere in the XML document before the encoded ampersand (&).

Simple code example:

  • RootTag class:
@XStreamAlias("rootTag")
public class RootTag {
    @XStreamAlias("text")
    private TextTag text;

    public TextTag getText() {
        return text;
    }
}
  • TextTag class:
@XStreamConverter(value = ToAttributedValueConverter.class, strings = {"value"})
@XStreamAlias("textTag")
public class TextTag {
    private String value;

    public String getValue() {
        return value;
    }
}
  • Test class with the simple XML input:
class XStreamTest {

    @Test
    void testXStreamFailsToParseAmpersandAfterSupplementaryCharacter() throws Exception {
        String input = """
                <?xml version="1.0" encoding="UTF-8"?>
                <rootTag>
                    <text>Test: &amp; ampersand before, supplementary character &#1113101;, ampersand &amp; after</text>
                </rootTag>""";

        XStream xStream = new XStream();
        xStream.processAnnotations(RootTag.class);
        xStream.addPermission(new ExplicitTypePermission(new Class[]{RootTag.class}));

        try (InputStream is = new ByteArrayInputStream(input.getBytes(StandardCharsets.UTF_8))) {
            RootTag rootTag = (RootTag) xStream.fromXML(is);
            assertEquals("Test: & ampersand before, supplementary symbol \uDBFF\uDC0D, ampersand & after",
                    rootTag.getText().getValue());
        }
    }
}
  • Output:
Expected :Test: & ampersand before, supplementary character 􏰍, ampersand & after
Actual   :Test: & ampersand before, supplementary character 􏰍, ampersand &� after

NOTE

This issue was initially reported here: x-stream/xstream#368

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions