-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Expected Behavior
XML document with the encoded Unicode characters from the #x10000-#x10FFFD Unicode range (like 𐀀 or ) should be parsed by MXParser without any issues with these characters or any other valid characters, regardless of their location in the document.
Actual Behavior
MXParser erroneously appends a replacement character (�) after the ampersand during parsing if the XML document contains a character from the #x10000-#x10FFFD Unicode range somewhere before the ampersand in the XML.
Steps to reproduce
- Java 17 (Amazon Corretto JDK, build 17.0.11+9-LTS), XStream v1.4.21
- The
􏰍
encoded character (, U+10FC0D, HEX: F4 8F B0 8D) should present somewhere in the XML document before the encoded ampersand (&
).
Simple code example:
RootTag
class:
@XStreamAlias("rootTag")
public class RootTag {
@XStreamAlias("text")
private TextTag text;
public TextTag getText() {
return text;
}
}
TextTag
class:
@XStreamConverter(value = ToAttributedValueConverter.class, strings = {"value"})
@XStreamAlias("textTag")
public class TextTag {
private String value;
public String getValue() {
return value;
}
}
- Test class with the simple XML input:
class XStreamTest {
@Test
void testXStreamFailsToParseAmpersandAfterSupplementaryCharacter() throws Exception {
String input = """
<?xml version="1.0" encoding="UTF-8"?>
<rootTag>
<text>Test: & ampersand before, supplementary character 􏰍, ampersand & after</text>
</rootTag>""";
XStream xStream = new XStream();
xStream.processAnnotations(RootTag.class);
xStream.addPermission(new ExplicitTypePermission(new Class[]{RootTag.class}));
try (InputStream is = new ByteArrayInputStream(input.getBytes(StandardCharsets.UTF_8))) {
RootTag rootTag = (RootTag) xStream.fromXML(is);
assertEquals("Test: & ampersand before, supplementary symbol \uDBFF\uDC0D, ampersand & after",
rootTag.getText().getValue());
}
}
}
- Output:
Expected :Test: & ampersand before, supplementary character , ampersand & after
Actual :Test: & ampersand before, supplementary character , ampersand &� after
NOTE
This issue was initially reported here: x-stream/xstream#368
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working