Behavior of RDF4J IRI validation for IRIs containing square brackets #3777
Replies: 1 comment
-
Thanks for bringing this up @aschwarte10 . You are correct that this IRI is invalid according to the RFC: square brackets are not allowed in the path of an IRI. The only way to carry such characters in an IRI would be to percent-encode them: String iri = ParsedIRI.create("http://example.com/m/a_[abc]").toString();
System.out.println(iri);
result:
http://example.com/m/a_%5Babc%5D The general philosophy on IRI validation in RDF4J is that by default, we strictly adhere to the specs in all parsers. However, we allow bypassing these checks in several places, for two reasons:
Looking at your examples, I quite agree there is some inconsistency in behavior though. I think your first three examples are behaving correctly, and are consistent as well, but I'm not sure that that SPARQL query in your fourth example should be allowed: I would expect the SPARQL parser to protest. I traced it down and it turns out we use a non-validating ValueFactory in the SPARQL parser. I think that may be an oversight. As for the discrepancy on processing the results of construct-queries: the in-memory store does not need to do any parsing of the result statements, as they come straight from the store. In contrast, the SPARQLRepository retrieves the result data in serialized form, and uses an RDF parser to deserialize it client-side. We should perhaps look into configuring that result parser to be more lenient, for consistency's sake. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi all,
we have been running into a case where a generated IRI contained square bracket - for the sake of the example let's use
http://example.com/m/a_[abc]
We observed that RDF4J behaves differently for different operations and query types. Specifically when going through RDF4J we get validation errors, which led us to looking into the spec. Also commercial databases seem to have diverging behavior for such IRIs.
My reading of the IRI spec (https://www.ietf.org/rfc/rfc3987.txt) is that square brackets are part of "reserved characters" which are not allowed in the path part of an IRI. This would mean that the RDF4J validation error is correct as expected.
I would like to get confirmation on this and also want to point to inconsistent behavior in RDF4J (see below). I can technically understand why this may happen, but from a user perspective this leads to confusions. Is this maybe something that should be fixed in RDF4J?
My test scenario:
(with memory store)
with a SPARQL repository
(assuming I was able to inser the statement with the broken IRI directly to the datbase)
From my test I see that different result parsers (specifically
RDFParser
s) are more strict than tuple result parsers.From a user perspective I would expect that if the CONSTRUCT query is failing, the SELECT query should do the same.
How do you see this? And is my understanding correct that we are talking about invalid IRIs in the first place (which are accepted currently by different databases)?
Thanks,
Andreas
Beta Was this translation helpful? Give feedback.
All reactions