Optionally preserve blank node IDs when skolemizing #3192
Replies: 6 comments
-
I should point out that in the case discussed on the user list, this improvement won't actually help: blank node id preservation only is meaningful when the serialized blank nodes have identifiers in the document. Since the discussion thread was about a JSON-LD document that has implicit blank nodes, there are no blank node ids that we can preserve over multiple parser runs. |
Beta Was this translation helpful? Give feedback.
-
A perhaps more interesting idea is to have a look at utility for canonicalizing and subsequently skolemizing blank nodes in a Model (so not during parsing, but modifying an existing Model collection). |
Beta Was this translation helpful? Give feedback.
-
Or perhaps both ? |
Beta Was this translation helpful? Give feedback.
-
Or what about a Model parser ? This would allow for reusing validation, skolemization logic... Result would be a new Model (which obviously requires more memory) |
Beta Was this translation helpful? Give feedback.
-
I see what you're getting at, but I'm a little worried about muddying up what "parsing" means. I think I'd rather do this in the form of additional functionality at the level of the Model API (either as new utility functions in Models, or something else). In particular, the skolemization logic is actually something that I think we may want to approach holistically (that is, looking at the entire model at once) than as part of a streaming (statement by statement) approach. Doing so will allow us to assign skolem ids that are not just random, but based on structure of the Model. This makes it possible to generate ids that are guaranteed identical for "the same" blank nodes, even when ran multiple times (this is what I meant when I mentioned canonicalization). There's a very good technical paper about this idea by Aidan Hogan (see http://aidanhogan.com/docs/rdf-canonicalisation.pdf). Having such an algorithm in place isn't just nice for blank node preservation and/or skolemization, it will also enable us to do far more efficient graph isomorphism comparisons. |
Beta Was this translation helpful? Give feedback.
-
Getting back to the original suggested improvement that you created this issue for: currently, what happens is that if PRESERVE_BNODE_IDS is set to true, it completely ignores the SKOLEMIZE_ORIGIN setting: the parser produces new blank nodes with the node id as found in the unparsed document (if present) - it doesn't convert anything into IRIs. We could change that behavior, so that in cases where both settings are active, it does skolemize them, reusing the existing bnode id instead of generating a new random one. It's not a difficult fix and I doubt it has any significant performance impact on the parser. I'm not particularly enthusiastic about its usefulness myself, but I don't have any objection either. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
From RDF4J user's list (https://groups.google.com/g/rdf4j-users/c/RG9ZVq-NEXs/m/QECcOoquAgAJ)
Sometimes it can be useful to generate the same skolem IRIs, instead of creating unique IRIs on each and every run.
Beta Was this translation helpful? Give feedback.
All reactions