Skip to content

XMLStreamReader.getAttributeValue(null, localName) does not ignore namespace URI #53

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
flappingeagle opened this issue Jun 15, 2018 · 6 comments
Milestone

Comments

@flappingeagle
Copy link

flappingeagle commented Jun 15, 2018

i use woodstox-5.1.0 and jackson-2.7.7

Following quick xml-example shows that the function getAttributeValue(..) does not work as expected for resolving an attribute of an xml-element by its localName.

The xml used here is an example from: https://www.w3schools.com/xml/schema_schema.asp

        final String test =
                        "<note xmlns=\"https://www.w3schools.com\"\n" +
                        "xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\"\n" +
                        "xsi:schemaLocation=\"https://www.w3schools.com note.xsd\">\n" +
                        "\n" +
                        "<to>Tove</to>\n" +
                        "<from>Jani</from>\n" +
                        "<heading>Reminder</heading>\n" +
                        "<body>Don't forget me this weekend!</body>\n" +
                        "</note> ";

        final InputStream is = new ByteArrayInputStream(test.getBytes(StandardCharsets.UTF_8));

        final XMLInputFactory staxInputFactory = XMLInputFactory.newInstance();
        final XMLStreamReader staxReader = staxInputFactory.createXMLStreamReader(is);

        staxReader.nextTag();

        // returns schemaLocation (correct!)
        final String attributeLocalName = staxReader.getAttributeLocalName(0);
        System.out.println(attributeLocalName);

        // returns value for schemaLocation (correct!)
        final String attributeValue = staxReader.getAttributeValue(0);
        System.out.println(attributeValue);

        // returns NULL (unexpected!)
        final String schemaLocation = staxReader.getAttributeValue(null, "schemaLocation");
        System.out.println(schemaLocation);
@cowtowncoder
Copy link
Member

To me that seems working exactly as specified. You are asking for value of attribute with namespace URI of null (which is taken to mean ""). There is no such attribute; your "schemaLocation" has namespace URI of "http://www.w3.org/2001/XMLSchema-instance". Two are not the same.

Looking at Stax javadoc, however, handling of namespace URI is defined as... something that makes no sense, claiming no matching is to be done. That is not what Stax specification said as far as I remember -- however, since Oracle does not make their TCK freely available (one has to be JCP member), I can not verify what their compliancy tests claim.

@flappingeagle
Copy link
Author

flappingeagle commented Jun 18, 2018

my use-case is:

i want to switch my Stax-Parser from "Xerces-J 2.7.1" which is bundled within the JRE to Woodstox.

The Javadoc of the JRE for the function "XMLStreamReader.getAttributeValue(..)" is as follow
(see: https://docs.oracle.com/javase/7/docs/api/javax/xml/stream/XMLStreamReader.html)

getAttributeValue(String namespaceURI, String localName)
Returns the normalized attribute value of the attribute with the namespace and localName If the namespaceURI is null the namespace is not checked for equality

My code that worked with "Xerces-J 2.7.1" now does not work with Woodstox, because Xerces works like the above Javadoc and tolerates NULL as value and will not check the namespaceURI for equality in this case.

I could also change my code, but i find it strange that the woodstox-code works differently than the javadoc.

@cowtowncoder
Copy link
Member

@flappingeagle while I see what javadoc says, I am not convinced that is how Stax specification dictates it (although... knowing how sparsely it was documented, can't be sure). But from XML perspective that interpretation is pretty senseless as namespace information is intrinsic part of element and attribute names, and ignoring that is just plain Wrong.

Be that as it may, this is how Woodstox is designed to work, based on my understanding of Stax specification. It is not a bug I would fix for a patch version, although if specification really does dictate "just ignore namespace", I would consider it a bug to fix for next minor version.
(on plus side, it is somewhat backwards compatible change in most regards).

Finally, reference to Xerces seems bit irrelevant as Xerces implements SAX and DOM interfaces, not Stax (unless I am mistaken? I don't think older version like 2.7.1 does, at least).
Given this I am not sure how Stax javadoc would be relevant for Xerces, which implements different API -- method names may be same (it's XML after all), but semantics not necessarily.

@flappingeagle
Copy link
Author

flappingeagle commented Jun 18, 2018

Ok thanks.


In regards to Xerces:
I use Java 1.8.0 (Oracle). Out of the box java will provide the following implementations for the Stax-API:

XMLInputFactory.newInstance()
com.sun.xml.internal.stream.XMLInputFactoryImpl

xmlInputFactory.createXMLStreamReader(input)
com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl

as you see, xerces is contained within the "rt.jar" of the Oracle JRE.

On the console the xerces-version of the JRE can be resolved like this (returns Xerces-J 2.7.1 in my case):

  java com.sun.org.apache.xerces.internal.impl.Version

And because XMLStreamReader is provided by xerces, i followed that Xerces does implement the Stax-API.

Just for info.

@cowtowncoder
Copy link
Member

@flappingeagle Ok thank you for including that information: I was not aware of Xerces-backed implementation (nor seen announcements). But class names do seem to suggest such existing.
I don't know if it might be based on old(er) Sun/Xerces implementation ("sjsxp").
Anyway that makes sense wrt your mention of Xerces. :)

But back to the original question... since Javadoc does indeed state that null should mean "ignore namespace information" that does sound like a deviation from Stax specification.

I will try to see how easy it would be to fix.

Thank you for reporting this.

@cowtowncoder cowtowncoder added the active Issue being actively investigated label Aug 21, 2018
@cowtowncoder cowtowncoder changed the title XMLStreamReader.getAttributeValue(..) does not resolve correctly in some cases. XMLStreamReader.getAttributeValue(null, localName) does not ignore namespace URI Aug 23, 2018
@cowtowncoder cowtowncoder added this to the 5.2.0 milestone Aug 23, 2018
@cowtowncoder cowtowncoder removed the active Issue being actively investigated label Aug 23, 2018
@cowtowncoder
Copy link
Member

Fixed. Attribute value lookup for null namespace case slightly less efficient as it needs linear scan (can't use hash), but unlikely to be measurable difference.
Will be included in 5.2.0, due out soon (will see if I can fix any other bugs).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants