Optionally preserve blank node IDs when skolemizing #3192

barthanssens · 2020-12-22T23:14:25Z

barthanssens
Dec 22, 2020
Collaborator

From RDF4J user's list (https://groups.google.com/g/rdf4j-users/c/RG9ZVq-NEXs/m/QECcOoquAgAJ)

The parser generates a unique id for parsed blank nodes deliberately, to avoid clashes.
There is a configuration option in the parser to preserve blank node identifiers, but it will not work in combination with SKOLEMIZE_ORIGIN.

Sometimes it can be useful to generate the same skolem IRIs, instead of creating unique IRIs on each and every run.

abrokenjester · 2020-12-23T06:18:42Z

abrokenjester
Dec 23, 2020
Maintainer

I should point out that in the case discussed on the user list, this improvement won't actually help: blank node id preservation only is meaningful when the serialized blank nodes have identifiers in the document. Since the discussion thread was about a JSON-LD document that has implicit blank nodes, there are no blank node ids that we can preserve over multiple parser runs.

0 replies

abrokenjester · 2020-12-23T06:20:21Z

abrokenjester
Dec 23, 2020
Maintainer

A perhaps more interesting idea is to have a look at utility for canonicalizing and subsequently skolemizing blank nodes in a Model (so not during parsing, but modifying an existing Model collection).

0 replies

barthanssens · 2020-12-23T09:51:58Z

barthanssens
Dec 23, 2020
Collaborator Author

Or perhaps both ?

0 replies

barthanssens · 2020-12-23T10:42:22Z

barthanssens
Dec 23, 2020
Collaborator Author

Or what about a Model parser ? This would allow for reusing validation, skolemization logic...

Result would be a new Model (which obviously requires more memory)

0 replies

abrokenjester · 2020-12-23T21:47:59Z

abrokenjester
Dec 23, 2020
Maintainer

Or what about a Model parser ? This would allow for reusing validation, skolemization logic...

I see what you're getting at, but I'm a little worried about muddying up what "parsing" means. I think I'd rather do this in the form of additional functionality at the level of the Model API (either as new utility functions in Models, or something else).

In particular, the skolemization logic is actually something that I think we may want to approach holistically (that is, looking at the entire model at once) than as part of a streaming (statement by statement) approach. Doing so will allow us to assign skolem ids that are not just random, but based on structure of the Model. This makes it possible to generate ids that are guaranteed identical for "the same" blank nodes, even when ran multiple times (this is what I meant when I mentioned canonicalization).

There's a very good technical paper about this idea by Aidan Hogan (see http://aidanhogan.com/docs/rdf-canonicalisation.pdf). Having such an algorithm in place isn't just nice for blank node preservation and/or skolemization, it will also enable us to do far more efficient graph isomorphism comparisons.

0 replies

abrokenjester · 2020-12-23T22:07:04Z

abrokenjester
Dec 23, 2020
Maintainer

Getting back to the original suggested improvement that you created this issue for: currently, what happens is that if PRESERVE_BNODE_IDS is set to true, it completely ignores the SKOLEMIZE_ORIGIN setting: the parser produces new blank nodes with the node id as found in the unparsed document (if present) - it doesn't convert anything into IRIs.

We could change that behavior, so that in cases where both settings are active, it does skolemize them, reusing the existing bnode id instead of generating a new random one. It's not a difficult fix and I doubt it has any significant performance impact on the parser. I'm not particularly enthusiastic about its usefulness myself, but I don't have any objection either.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optionally preserve blank node IDs when skolemizing #3192

Uh oh!

{{title}}

Uh oh!

Replies: 6 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Optionally preserve blank node IDs when skolemizing #3192

Uh oh!

barthanssens Dec 22, 2020 Collaborator

Replies: 6 comments

Uh oh!

abrokenjester Dec 23, 2020 Maintainer

Uh oh!

abrokenjester Dec 23, 2020 Maintainer

Uh oh!

barthanssens Dec 23, 2020 Collaborator Author

Uh oh!

barthanssens Dec 23, 2020 Collaborator Author

Uh oh!

abrokenjester Dec 23, 2020 Maintainer

Uh oh!

abrokenjester Dec 23, 2020 Maintainer

barthanssens
Dec 22, 2020
Collaborator

abrokenjester
Dec 23, 2020
Maintainer

abrokenjester
Dec 23, 2020
Maintainer

barthanssens
Dec 23, 2020
Collaborator Author

barthanssens
Dec 23, 2020
Collaborator Author

abrokenjester
Dec 23, 2020
Maintainer

abrokenjester
Dec 23, 2020
Maintainer