Support complex wildcard language ranges #3190
Replies: 12 comments
-
@jeenbroekstra what do you think? |
Beta Was this translation helpful? Give feedback.
-
Can you elaborate with a few examples on what such complex wildcards allow you to do that is beyond what the current standard langMatches offers? If there's a good use case for it, I'm not against adding it. The logical place would be to add an extension to the |
Beta Was this translation helpful? Give feedback.
-
"en-*" matches "en-GB" but not "en". "de-*-DE" would match "de-latn-DE" but not "de-DE-1996" but would match "de-Latn-DE-1996". (by "extended filtering" it would actually match de-DE-1996 since it understand the groups of the language tags....I still can't quite get my head around it). Neither of these are possible today. |
Beta Was this translation helpful? Give feedback.
-
Another option is a sail level config for enabling extended filtering. Essentially allows users to set one of these: https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/Locale.FilteringMode.html |
Beta Was this translation helpful? Give feedback.
-
Thanks for the examples and the link, I think it's a lot clearer now what we're after, and I can see the use of it. As for where to put this, I think defining it as a separate sail config option may be problematic. If you do it that way, you almost automatically force custom handling of the option by every sail implementation (while if we build it into the evaluation strategy, you get it working for free on pretty much every sail). Note, by the way, that the evaluation strategy is itself a sail config option, currently. I have been thinking though that I don't particularly like the current inheritance structure we use for evaluation strategies. I'd like it if we could make it a bit more modular so that individual extensions can be configured to be enabled or disabled, rather than the current "all extensions or nothing" approach. |
Beta Was this translation helpful? Give feedback.
-
@rdstn could you share some examples and expected behaviours with us? |
Beta Was this translation helpful? Give feedback.
-
Sure. The prime example I've been working with is related to SHACL. We have a string, for example, a name, which must match at least one sign language and at least on language spoken in France. We also do not have processing for Cyrilic, so we have to not match any Cyrl tags. And we dislike the GB variety of English, since American english is strongly encouraged in our enterprise. This would have to be something like that:
Those use qualified max count, which is not part of SHACL in RDF4J (yet), but I imagine that the corresponding SPARQL can be easily derived. This would suggest that the following data is invalid (breaks all constraints):
While the following data is valid:
The final shape example is highly contrived, but I can imagine instances where the first three shapes are of interest. |
Beta Was this translation helpful? Give feedback.
-
I would prefer "en-*" match "en-GB" but not "en". While "en" would match "en" and "en-GB". The algorithm in Jena works as follows.
|
Beta Was this translation helpful? Give feedback.
-
Is there a way to match
If it's not feasible, I agree that matching |
Beta Was this translation helpful? Give feedback.
-
With SHACL I guess it would be possible to do something like " |
Beta Was this translation helpful? Give feedback.
-
@jeenbroekstra could we do this without adding any configuration? The algorithm that Jena uses is backwards compatible, so if we use the same algorithm we could introduce it in the next feature release. |
Beta Was this translation helpful? Give feedback.
-
I'm not sure what you mean with "without adding any configuration"? If you mean just adding the feature to the ExtendedEvaluationStrategy I'm fine with that . |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
SPARQL langMatch should support basic language ranges, which excludes the use of more complex wildcards. For instance "en-*" is not allowed, but "en" works almost the same way.
Jena has chosen to support these more complex wildcard language ranges. However it does not go as far at to support extended language ranges. Extended language ranges go much further than just supporting wildcard.
This would move us away from the SPARQL 1.1 recommendation, but would align us more closely with Jena.
Beta Was this translation helpful? Give feedback.
All reactions