Polymorphism w type designator: make linkml-validate work by adding support in dataclasses + enable include_range_class_descendants for json schema validator #1257

kervel · 2023-01-21T14:09:51Z

New version of pr #1212 , rebased and using a source branch in the linkml repo. Should me merged only after #1224

"controversial" change: i set include_range_class_descendants to true for the schema's generated for jsonschemavalidator. it doesn't break any test in the test suite.

kervel · 2023-01-22T09:04:23Z

not yet finished. gen-json-schema will need some refactoring, i think we will need a function that will expands the slot range to a list of all possible classes/enums, for any combination of any_of and polymorphism classes. other generators could also benefit from this ?

the bug in the current PR version is this: i look up all class descendants of slot.range to generate json-schema anyOf which is not correct for enums (there the range is string or even None)

i wonder if there are still other edge cases (combinations of anyOf / allOf and ranges) that will be hard to support.

kervel · 2023-01-22T22:38:20Z

i just found out about include_range_class_descendants, and my test passes without my json schema changes (so i'll revert them) if i enable it.

But include_range_class_descendants is set to False for jsonschemavalidator, is there a reason for this ? in the validator there is no way to set this to true. On first sight a default of True makes more sense (since not including the class descendants makes a schema that is more strict than what linkml expresses), do we know why the default was set to False ?

codecov-commenter · 2023-01-23T08:17:14Z

Codecov Report

Patch coverage: 90.90% and project coverage change: +0.36% 🎉

Comparison is base (20da3a9) 79.95% compared to head (6c074d7) 80.31%.
Report is 25 commits behind head on main.

❗ Current head 6c074d7 differs from pull request most recent head 3411769. Consider uploading reports for the commit 3411769 to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1257      +/-   ##
==========================================
+ Coverage   79.95%   80.31%   +0.36%     
==========================================
  Files          93       77      -16     
  Lines        9711     8906     -805     
  Branches     2382     2164     -218     
==========================================
- Hits         7764     7153     -611     
+ Misses       1502     1333     -169     
+ Partials      445      420      -25

Files Changed	Coverage Δ
linkml/validators/jsonschemavalidator.py	`52.94% <66.66%> (-6.89%)`	⬇️
linkml/generators/pythongen.py	`88.94% <93.10%> (+0.48%)`	⬆️
linkml/generators/jsonschemagen.py	`87.84% <100.00%> (-1.99%)`	⬇️

... and 80 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

linkml/generators/pythongen.py

kevinschaper · 2023-01-27T22:07:12Z

linkml/generators/pythongen.py

+                rlines.append(f"""
+        type_designator = "{aliased_slot_name}"
+        if not type_designator in kwargs:
+            return super().__new__(cls,*args,**kwargs)


We're a little nervous about whether we have enough tests that take advantage of designates_type to catch unintended side effects. One thought I had was that even though this is likely perfectly safe to just pass up to the super constructor without arguments, it might be a step safer to just return an empty string in the case that none of the subclasses has a designates_type slot.

(I think maybe on/before line 782 you might be able to use a list comprehension to count the number of descendants with designates_type slots, rather than the total number of descendants)

Hello! I'm not sure if i understood everything correctly..

For the list comprehension: don't all child classes of a class having a type designator have a type designator too ? Would there be a chance that the list comprehension would give less classes than the full class descendants list ?

Btw, for the pydantic generator i did the same thing (take full list of class descendants to build the Union[])..

Thanks!
Frank

In any case, i will add an extra test that verifies the correct working if we create an instance of a class having a type designator, without actually passing the value of the type designator (this used to work before, so it should still work).

the extra test uncovered some issues:

I only generate a new when the type designator is a class slot and not an induced slot. This is wrong and breaks if i just instantiate an object of a subclass: then the type designator is not initialized correctly.

When i have a class hierarchy A --> B --> C and i instantiate a B that is actually a C (because of its type designator value), it remains at B because the type designator was defined at A and so only A has a new

going to fix these

kevinschaper · 2023-01-31T02:40:56Z

I played with it interactively and it feels like magic!

hsolbrig · 2023-01-31T02:52:47Z

You don't need to emit that mapping table -- all of the URI and CURIE information is already available in the generated python as class_class_curie, class_class_uri and class_model_uri. Take a look at linkml/linkml-runtime#243 - using that, the __new__ function can use the cls._class_for_uri or cls._class_for_curie functions.

I think that the above approach allows subclasses to be added that are defined in different models and imported, which should be a bit more versitile?

I noticed that you were using the textual curies in the generated python. I think that there would be some interesting use cases for using external URI's (those declared in the model) vs. the model URI's or curies themselves.

The generated code could then be reduced to:

       type_designator = "category"
        if not type_designator in kwargs:
            return super().__new__(cls,*args,**kwargs)
        else:
            return super().__new__(cls._class_class_curie(kwargs[type_designator]),*args,**kwargs)

I'm not super familiar with the type_designator -- can it be a curie OR a uri? I can see a lot of uses for the latter!!

We still need to address the issue of a type_designator that isn't in the table. At the moment, the functions defined above return None if there is no match. We should probably detect this and emit a sensible error -- I'm sure the None from the return above would generate a really (really) opaque error message ;-)

cmungall · 2023-01-31T02:57:10Z

Type designator is Boolean, but the slot it applies to has a range applied like any other, so you can have string fields be type designators, curie field...

cmungall · 2023-01-31T03:02:15Z

See the slides from the last community meeting https://docs.google.com/presentation/d/1yIOHGZCNRmk6DYYZSk9Rl_JI1atjZ4Y-fbVydtlC1Ig/edit

kervel · 2023-01-31T11:00:07Z

Hello @hsolbrig, i tried to implement your changes:

elimination of the mapping table
raise ValueError when the type designator has a value we don't understand

they work and the tests pass but there is a problem in case the class has a class_uri defined. In that case , actually, a class has two uri's: the native one (generated with prefix and class name) and the one in class_uri. in owlGen the two uris are present and have a skos:exactmatch relation, which seems logical.

the specification on the range of the type designator slots done by @cmungall in #945 doesn't mention which one should be used, so for pydanticgen i used the "native" one. i'd like to do that here to so the json would be interoperable.

When i define a class_uri slot, the class_class_curie is empty (because there is no prefix for the extneral URL i use in my class_uri). This breaks my code now. So i would need a class_model_curie classvar. I tried to do this, but i'm not sure if i'm using the Namespaces object correctly.

I'm now just pushing commits to get your feedback. Once you think it is fine, i will rebase the commits again to have a series of 3 to 4 logical commits.

kervel · 2023-01-31T12:36:40Z

pipeline failure will be fixed by #1275 hopefully

hsolbrig · 2023-01-31T20:24:10Z

@kervel - I think that you've touched on something that we may want to discuss just a bit more. There is no reason that every URI should have a corresponding CURIE and, in fact, there is nothing besides convention that prevents a URI from being represented by more than one CURIE. As an example, even IF we assign the prefix, x: to http://example.org/ in the model, there is no reason that a block of data might use ex: or even ns1:.

My own opinion on the matter is that the lookup should always be done by URI, and either the caller or the implementation code needs to do the transformation before doing the lookup. If there isn't a LinkML method available already that converts a CURIE to a URI, I would propose that we need to get one into place. The CurieNamespaces in the generated python module are intended to address part of that problem, but we still need to be able to find the class associated with x:Organization from data that uses ex:Organization instead. To be honest, I'd been a bit hesitant to include curie's in the class information at all for this very reason. While there has been a push for canonical curie's in the bio space, the majority of the communities I'm familiar with deal strictly with uri's and make very little effort to render curie's globally unique -- x:, ex:, 'ns1:' and other variations all show up in a whole lot of test cases, randomly assigned to a variety of different base URI's.

Another option that we might consider is to deal with two type identifier thingiess: strings and URI's. If a string looks like a URI, treat it as such. If not, assume it is the target class name? We do have the default prefix at our disposal, so we could also prepend it, map it to a URI and go from there.

WRT model uri/curie vs class uri/curie: class uri will be the same as model uri if there is no class_uri assignment. If there is, we could either a) assume that class_uri is the correct one, as the developers did go to the trouble of saying as much or b) try class_uri first and, if no match, `model_uri.

I had sort of intended the _class_for_curie and _class_for_uri methods to be the entry points w/ class_for being the common function. I know - not obvious when they ALL start with '_', but I don't want to pollute the namespace.

kervel · 2023-01-31T22:24:29Z

Hello @hsolbrig,

I understand the logic of first transforming anything in the type designator value to an URI and then doing the lookup. If i understand correctly:

if the range is uriorcurie then specifying an uri or a curie should both work. Specifying a class name would not make sense in this case, and i would not expect it to work.
if the range is uri then specifying an uri would work, and we could disable the curie->uri translation. Specifying a class name makes no sense either.
if the range is a string, then the class name should be specified, and neither uri or curie would make sense here.

So the logic would be to always look at the uri classvar (except for the string range case, which would look at the class name case) and optionally try to translate curies to uri's. It would probably also be sensible to understand both native and non-native uri's/curies (so both class_class_uri and class_model_uri).

I would need some guidance here on doing the curie->uri translation, then i could give it a try.

That still leaves the opposite problem open: i just instantiate a Person() without specifying the type designator value, i would then expect the type designator value to be auto-initialized to the correct value. And then we need a single choice, and i think it makes sense to (in that case) always take the native uri or curie, because those are guaranteed to be always defined (there is always a default_prefix)

For the pydantic generator, i don't think we can completely do the same implementation strategy: the pydantic classes don't have any linkml(runtime) dependency (and that has its advantages too in some cases). That would mean that we would need to build a list of acceptable values for the type designator value for every class at generation time. Current implementation only has a single value, but we could also make it support for instance both the native/non-native and both uri and curie.

hsolbrig · 2023-02-01T00:46:30Z

Just toying around with "That still leaves the opposite problem open: i just instantiate a Person() without specifying the type designator value, i would then expect the type designator value to be auto-initialized to the correct value. And then we need a single choice, and i think it makes sense to (in that case) always take the native uri or curie, because those are guaranteed to be always defined (there is always a default_prefix)"

Wrt single choice, I'd argue for class_class_uri because it will always be there and, while defaulting to the same as the model, it can be overriden.

A proposed riff on your existing code:

@dataclass
class NamedThing(YAMLRoot):
    _inherited_slots: ClassVar[List[str]] = []

    class_class_uri: ClassVar[URIRef] = URIRef("http://example.org/NamedThing")
    class_class_curie: ClassVar[str] = None
    class_name: ClassVar[str] = "NamedThing"

    class_model_uri: ClassVar[URIRef] = URIRef("http://example.org/NamedThing")

    id: Union[str, NamedThingId] = None
    full_name: Optional[str] = None
    thingtype: URIorCURIE = dataclasses.field(init=False)

    @property
    def thingtype(self) -> URIorCURIE:
        return URIorCURIE(self.class_class_uri)

    @thingtype.setter
    def thingtype(self, v: Any) -> None:
        pass

    def __post_init__(self, *_: List[str], **kwargs: Dict[str, Any]):
        if self._is_empty(self.id):
            self.MissingRequiredField("id")
        if not isinstance(self.id, NamedThingId):
            self.id = NamedThingId(self.id)

        if self.full_name is not None and not isinstance(self.full_name, str):
            self.full_name = str(self.full_name)

        super().__post_init__(**kwargs)



    def __new__(cls, *args, **kwargs):
        uri = kwargs.pop("thingtype", cls.class_class_uri)
        target_cls = self._class_for_uri(uri)
        if target_cls is None:
            # Should this be a warning instead?  If so, what do we do with the new URI?
            raise ValueError(
                f"Wrong type designator value: class NamedThing has no subclass with class_class_uri='{uri}'")
        return super().__new__(target_cls, *args, **kwargs)

I popped "thingtype" from the kwargs because we don't want to try to be setting it. That said, we might get more clever if we're going to look up by name or curie.

Just pfutzing around...

when I run:

p = Person(id="Fred")
print(repr(p))
p.thingtype = 'abc'

I get

Person(id='Fred', full_name=None, thingtype='http://testbreaker/not-the-uri-you-expect', height=None)

Which, despite the "not-the-uri-you-expect", I think that it may be the one we wanted.

The setter is there because I can't convince the dataclass innards from trying to set it. It does leave us with an unfortunate side effect in that p.thingtype = 'abc' is a no-op. I suppose we could get slightly more clever and throw
an error if the input argument isn't none, but I'm curious whether this is still more or less on track.

hsolbrig · 2023-02-01T19:55:49Z

Just to double check -- by "native", we mean the URI that is assigned to the class by combining the default namespace of the model with the class name. I'll use "assigned" to refer to a second URI that can be assigned to a class by the modeler.

The purpose of the native URI's was to assign a default URI to every class, slot and type in the model. This allowed someone to develop an RDF based model without having to know much about RDF. In the python dataclass based Class, this URI is referred to as class_model_uri. The class_uri (and slot_uri and type_uri - apologies for the less than consistent naming) were intended for use by modelers who had RDF integration in mind -- they could override the default URI and assign an external one instead. The class_class_uri property carries the native URI if a class_uri hasn't been specified, otherwise it would be the override.

Switching to type_definition because it is already taking full advantage of this, we have stated that RDF will refer to a LinkML integer as xsd:integer, even though the LinkML model treats it as `linkml:integer'

class Integer(int):
    """ An integer """
    type_class_uri = XSD.integer
    type_class_curie = "xsd:integer"
    type_name = "integer"
    type_model_uri = LINKML.Integer

While the Class, SubSetDefinition assumes the native LinkML URI:

@dataclass
class SubsetDefinition(Element):
    """
    the name and description of a subset
    """
    _inherited_slots: ClassVar[List[str]] = []

    class_class_uri: ClassVar[URIRef] = LINKML.SubsetDefinition
    class_class_curie: ClassVar[str] = "linkml:SubsetDefinition"
    class_name: ClassVar[str] = "subset_definition"
    class_model_uri: ClassVar[URIRef] = LINKML.SubsetDefinition

One of the goals of this assignment was to address the sort of model used in the HL7 FHIR Observation model, where the LOINC code (a URI) could be used to identify the particular model instance.

The reason that I'm pushing back on using the "native" URI is that it prevents this second situation -- if you have a class instance that is derived from and external model that uses LOINC, you will need to maintain an external map between the LOINC URI's and corresponding native model uri's, even though that map is already intrinsically available.

Thinking about the Observation model, however, it occurs to me that what we may want to think about doing is to associate the type designator with an enumeration and, perhaps, also consider semi-automatic generation of classes.

kervel · 2023-02-01T20:32:12Z

Hello,

I think i'm in a similar case: i'm trying to create a linkml model that uses parts of RSM (https://rsm.uic.org/doc/rsm/rsm-1-2/). There, the classes and slots have URI's but they are generated by (i think) enterprise architect, and not human readable. For instance, this is the uri for the "latitude" slot: http://rsm.uic.org/RSM12#EAID_ABEB1DF5_8277_452c_9B67_6786E4FA9620.
Setting them as class_uri and slot_uri makes the owl generator emit an exactmatch between the native and the assigned uri.

The json serialisation format (and the python classes) are things i want to use internally, and for external interop i could convert to json-ld or rdf. So i would get a slight edge by using the native uri's in my case because the json would be more human readable. However, since i am in control over the json format, switching is not a dealbreaker for me, and i totally understand your arguments, and i'm happy to switch (i will then also switch the pydantic implementation to be consistent).

I then think the remaining thing is to have some (maybe generated) class that can convert between uri's and curies for all known namespaces. On the slack channel there was a reminder not to create new uri/curie converters and using https://github.com/cthoyt/curies for that task. It is already a dependency of linkml-runtime. Would it be an idea to generate the creation of a converter with all known namespaces ? a Converter has no function yet to convert an uriorcurie (it needs to be either an uri or a curie) so we would need to look for the presence of :/ which is a bit ugly.

the generated code would then look like so:

# Namespaces
LINKML = CurieNamespace('linkml', 'https://w3id.org/linkml/')
SHEX = CurieNamespace('shex', 'http://www.w3.org/ns/shex#')
X = CurieNamespace('x', 'http://example.org/')
XSD = CurieNamespace('xsd', 'http://www.w3.org/2001/XMLSchema#')
DEFAULT_ = X
# converter (we would probably need to extend this class to provide a uri_from_uriorcurie function
NamespaceConverter = curies.Converter.from_prefix_map({
     LINKML.prefix: LINKML,
     SHEX.prefix: SHEX,
     X.prefix: X,
     XSD.prefix, XSD
})

so the logic would then be:

for the "instantiate a blank object" case --> we use class_class_uri or class_class_curie (with a fallback to class_class_uri if there is no curie
for the "load json" case --> we take whatever is the input, convert it to an uri and then try to find a class that has either a matching class_class_uri or a matching class_model_uri (this way we accept all know uri's/curies for a class as type designators)

would that make sense ?

hsolbrig · 2023-02-02T00:08:14Z

Actually, CurieNamespace was intended to handle exactly that problem already. We never completed it. Here is a proposed solution:

from typing import Optional, Union, Dict

from rdflib import Namespace, URIRef


class CurieNamespace(Namespace):
    catalog: Dict[str, "CurieNamespace"] = dict()

    @classmethod
    def to_curie(cls, uri: Union[str, URIRef]) -> str:
        uri = str(uri)
        candidate_ns = ""
        for prefix, ns in cls.catalog.items():
            if uri.startswith(ns) and len(ns) > len(candidate_ns):
                candidate_ns = ns
        if candidate_ns:
            return candidate_ns.curie(uri[len(candidate_ns):])
        return None

    @classmethod
    def to_uri(cls, curie: str) -> Optional[URIRef]:
        prefix, localname = curie.split(':', 1)
        ns = CurieNamespace.catalog.get(prefix, None)
        return ns[localname] if ns else None

    def __new__(cls, prefix: str, ns: Union[str, bytes, URIRef]) -> "CurieNamespace":
        rt = Namespace.__new__(cls, str(ns) if not isinstance(ns, bytes) else ns)
        rt.prefix = prefix
        CurieNamespace.catalog[prefix] = rt
        return rt

    def curie(self, reference: Optional[str] = '') -> str:
        return self.prefix + ':' + reference

As a test,

print(CurieNamespace.to_curie(XSD.foo))
xsd:foo

print(CurieNamespace.to_uri("linkml:stuff"))
https://w3id.org/linkml/stuff

If this looks ok, I'll put in a PR on linkml-runtime for this tomorrow. There are some outstanding issues we would want to consider to make this robust. It looks like we might want to leverage the curiemap you had above a bit more directly?

As far as issues, one of the interesting things is, as catalog is a class variable, we end up accumulating every namespace defined in a particular module. As an example, using that polymorphism test model that you created, we get:

pprint(CurieNamespace.catalog)

{'': Namespace('http://example.org/'),
 'NCIT': Namespace('http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#'),
 'OIO': Namespace('http://www.geneontology.org/formats/oboInOwl#'),
 'bibo': Namespace('http://purl.org/ontology/bibo/'),
 'dcterms': Namespace('http://purl.org/dc/terms/'),
 'linkml': Namespace('https://w3id.org/linkml/'),
 'oslc': Namespace('http://open-services.net/ns/core#'),
 'owl': Namespace('http://www.w3.org/2002/07/owl#'),
 'pav': Namespace('http://purl.org/pav/'),
 'prov': Namespace('http://www.w3.org/ns/prov#'),
 'qb': Namespace('http://purl.org/linked-data/cube#'),
 'qudt': Namespace('http://qudt.org/schema/qudt/'),
 'rdf': Namespace('http://www.w3.org/1999/02/22-rdf-syntax-ns#'),
 'rdfs': Namespace('http://www.w3.org/2000/01/rdf-schema#'),
 'schema': Namespace('http://schema.org/'),
 'sh': Namespace('https://w3id.org/shacl/'),
 'shex': Namespace('http://www.w3.org/ns/shex#'),
 'skos': Namespace('http://www.w3.org/2004/02/skos/core#'),
 'skosxl': Namespace('http://www.w3.org/2008/05/skos-xl#'),
 'swrl': Namespace('http://www.w3.org/2003/11/swrl#'),
 'tccm': Namespace('https://ontologies.r.us/tccm/'),
 'vann': Namespace('https://vocab.org/vann/'),
 'xsd': Namespace('http://www.w3.org/2001/XMLSchema#')}

I'd recommend that we detect collisions (x:, ex: and : being prime examples) and decide what to do about them.

hsolbrig · 2023-02-02T00:49:39Z

for the "instantiate a blank object" case --> we use class_class_uri or class_class_curie (with a fallback to class_class_uri if there is no curie

for the "load json" case --> we take whatever is the input, convert it to an uri and then try to find a class that has either a matching class_class_uri or a matching class_model_uri (this way we accept all know uri's/curies for a class as type designators)

Looks good to me. I'd prefer trying to stick to uri's when it comes to data representation. If you start putting curies into databases, you're sort of obligated to include the accompanying prefix maps and, when it comes to comparisons uri <--> uri comparisons always work. curie <--> curie comparisons are dependent on the maps being available and permanent.

kervel · 2023-02-02T09:01:08Z

hello @hsolbrig ,

About the "curies in databases" remark: i guess this translates to "do curies make sense as type designator or not". And there i think both @cmungall and @sierra-moxon argued for yes (see [WIP] Pydanticgen: support polymorphism using type designators #1180 (comment) and the description of Add support for designates_type in all generators #945)
Didn't know about curienamespace, but as far as i can judge it would be fine for what we need. I'll try to follow up your PR with a new version of this one, and with a new PR for pydantic. Not certain about the prefixmap collisions, we could start by introducing a testcase that purposely introduces a collision. I don't know the other generators, the problem might be bigger than just the python generator then. Is requiring the prefixes to be globally unique in case of nested (imported) linkml schema's not limiting flexibility ? On the other hand, if we don't scope the prefixes, then we'd better have an explicit error for collisions.

hsolbrig · 2023-02-02T20:29:58Z

I've proposed a CurieNamespace change (linkml/linkml-runtime#244). As noted in the PR, it doesn't use the curies package, because I couldn't figure out how to incrementally update the curies vs. in one big batch. Any solution which passes test_curienamespace.py should be good.

kervel · 2023-03-23T12:28:57Z

test will fail until we merge linkml/linkml-runtime#251

there is also a docgen test that fails on my system, but i'm running with linkml-runtime from the main branch with PR 251 applied, maybe that's the cause ?

cmungall · 2023-03-25T01:49:13Z

tests/test_issues/test_validate_polymorphism.py

+from .test_pydantic_polymorphism import data_str, gen_test_schema
+import json
+
+class PydanticPolymorphismTestCase(TestEnvironmentTestCase):


is this mis-named? this seems to be testing PythonGenerator (i.e dataclass generation)

it was originally meant to test the whole linkml-validate flow for polymorphism (which includes python generator but also json schema validation), but since the json schema validation is now a separate PR, it makes sense to rename the file. will do.

cmungall

I'm a bit confused by the test cases

test_pydantic_polymorphism changes look OK but seem unrelated to what is now the focus of the PR, which is pythongen
the 2nd test does seem to test pythongen but has pydantic in the name

kervel · 2023-03-25T09:47:02Z

I'm a bit confused by the test cases

test_pydantic_polymorphism changes look OK but seem unrelated to what is now the focus of the PR, which is pythongen

the 2nd test does seem to test pythongen but has pydantic in the name

I tried to make pydantic and python tests use the same data ... i made this more clear now by having them import from a common module instead of the one importing the other. As a bonus i added a type hierarchy test just like we had in pydantic ... which fails now :( so more work needed.

…ion of range of type designators in #945)

Because the pydantic and the python tests now have similar test cases, i refactored the pydantic test so that they can use common test schema's and data.

capsulecorplab · 2023-05-02T22:25:11Z

Any progress on this effort? Looking forward to this feature being added to the linkml toolchain

kervel · 2023-05-13T14:20:19Z

i'm stuck on this one: linkml/linkml-runtime#251 .. once i get the feedback i dont think there is much work left

cmungall · 2023-09-12T16:58:17Z

linkml/generators/pythongen.py

@@ -162,7 +169,7 @@ def gen_schema(self) -> str:
 from linkml_runtime.utils.formatutils import camelcase, underscore, sfx
 {handlerimport}
 from rdflib import Namespace, URIRef
-from linkml_runtime.utils.curienamespace import CurieNamespace
+from linkml_runtime.utils.curienamespace import CurieNamespace, CurieNamespaceCatalog


We think this PR should be decoupled from changes in linkml-runtime

cmungall · 2023-09-12T17:09:22Z

tests/test_issues/test_pydantic_polymorphism.py

    assert len([x for x in output_subset if "x:Person" in x]) == 1

    gen = PydanticGenerator(schema_str.replace("uriorcurie", "uri"))
    output = gen.serialize()
    output_subset = [line for line in output.splitlines() if "thingtype" in line]
    assert len(output_subset) > 0
    assert len([x for x in output_subset if "http://example.org/Person" in x]) == 1
+


something odd is going on with the diff - in the intervening time between creation of this PR and now we refactored to use pytest. This is showing up here like unittest except unindented...

@kervel

This PR incorporates work done in #1257 by @kervel

Silvanoc · 2023-09-20T15:32:14Z

Sorry for jumping into the conversation this late, but this MR is touching a topics that I'm interested on.

@hsolbrig wrote

My own opinion on the matter is that the lookup should always be done by URI, and either the caller or the implementation code needs to do the transformation before doing the lookup. If there isn't a LinkML method available already that converts a CURIE to a URI, I would propose that we need to get one into place. The CurieNamespaces in the generated python module are intended to address part of that problem, but we still need to be able to find the class associated with x:Organization from data that uses ex:Organization instead. To be honest, I'd been a bit hesitant to include curie's in the class information at all for this very reason. While there has been a push for canonical curie's in the bio space, the majority of the communities I'm familiar with deal strictly with uri's and make very little effort to render curie's globally unique -- x:, ex:, 'ns1:' and other variations all show up in a whole lot of test cases, randomly assigned to a variety of different base URI's.

As stated in this comment to issue #258, I find the usage of CURIEs/URIs in LinkML needs some improvements:

I agree with @hsolbrig that LinkML should use internally only URIs. CURIEs should be only syntactic sugar for the users that LinkML would always convert into URIs. It would imply that two CURIEs with different prefixes, but the same suffix, might end up being the same (like x:Organization and ex:Organization in the above example).
As long as LinkML doesn't use SafeCURIEs, there is some space for ambiguity between URIs and CURIEs. As an alternative JSON-LD Compact IRIs significantly reduce the room for ambiguities, because only listed prefixes are interpreted as CURIEs, otherwise URIs are expected.

WRT designates_type, I don't see how a string can be used to validate, unless it's clear that it can only be a class name (would it then be expected to be a CURIE with the default prefix?). Otherwise continuing the above logic I would say that a CURIE (even when relying on the default prefix) should only be valid if the resulting URI identifies a valid class.

kervel · 2023-09-26T09:49:51Z

see #1617

kervel marked this pull request as draft January 21, 2023 15:24

kervel force-pushed the feature/poly_dataclasses branch from 9bf4319 to 4b99a79 Compare January 22, 2023 21:20

kervel force-pushed the feature/poly_dataclasses branch from 4b99a79 to 8e56f2a Compare January 23, 2023 07:52

kervel force-pushed the feature/poly_dataclasses branch from 8e56f2a to 484a392 Compare January 23, 2023 08:46

kervel marked this pull request as ready for review January 23, 2023 09:39

sierra-moxon requested review from sierra-moxon and kevinschaper and removed request for kevinschaper and sierra-moxon January 24, 2023 21:49

cmungall requested a review from hsolbrig January 26, 2023 18:58

kevinschaper reviewed Jan 26, 2023

View reviewed changes

linkml/generators/pythongen.py Outdated Show resolved Hide resolved

kervel force-pushed the feature/poly_dataclasses branch from 484a392 to 935b8a6 Compare January 26, 2023 20:41

kevinschaper reviewed Jan 27, 2023

View reviewed changes

kervel force-pushed the feature/poly_dataclasses branch from 935b8a6 to 6c074d7 Compare January 30, 2023 10:19

kevinschaper approved these changes Jan 31, 2023

View reviewed changes

kervel mentioned this pull request Feb 1, 2023

Pydantic gen: switch to external uri/curie by default, still accept native uri/curie. #1275

Merged

kervel mentioned this pull request Mar 16, 2023

jsonschemagen: use anyOf rather than oneOf for range class descendants #1342

Merged

kervel force-pushed the feature/poly_dataclasses branch from be5c32f to 65c81b8 Compare March 23, 2023 09:09

kervel mentioned this pull request Mar 23, 2023

CurieNamespace catalog that is not a singleton and that uses curies linkml/linkml-runtime#251

Open

kervel force-pushed the feature/poly_dataclasses branch from 65c81b8 to 968497f Compare March 23, 2023 12:26

cmungall reviewed Mar 25, 2023

View reviewed changes

kervel added 2 commits March 28, 2023 10:40

allow polymorphism for dataclasses using type designators (specificat…

bb12a14

…ion of range of type designators in #945)

Add a test for python dataclasses polymorphism.

8715683

Because the pydantic and the python tests now have similar test cases, i refactored the pydantic test so that they can use common test schema's and data.

kervel force-pushed the feature/poly_dataclasses branch from 0dd6e94 to 8715683 Compare March 28, 2023 08:41

Merge branch 'main' into feature/poly_dataclasses

edfa4eb

Merge branch 'main' into feature/poly_dataclasses

3411769

cmungall reviewed Sep 12, 2023

View reviewed changes

cmungall added a commit that referenced this pull request Sep 12, 2023

Adding type_designators to pythongen. Fixes #1608

7759db0

This PR incorporates work done in #1257 by @kervel

cmungall mentioned this pull request Sep 12, 2023

allow polymorphism for dataclasses using type designators #1617

Merged

kervel closed this Sep 26, 2023

pkalita-lbl deleted the feature/poly_dataclasses branch December 22, 2023 17:36

Polymorphism w type designator: make linkml-validate work by adding support in dataclasses + enable include_range_class_descendants for json schema validator #1257

Polymorphism w type designator: make linkml-validate work by adding support in dataclasses + enable include_range_class_descendants for json schema validator #1257

Uh oh!

Conversation

kervel commented Jan 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kervel commented Jan 22, 2023

Uh oh!

kervel commented Jan 22, 2023

Uh oh!

codecov-commenter commented Jan 23, 2023 • edited by codecov bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kevinschaper commented Jan 31, 2023

Uh oh!

hsolbrig commented Jan 31, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cmungall commented Jan 31, 2023

Uh oh!

cmungall commented Jan 31, 2023

Uh oh!

kervel commented Jan 31, 2023

Uh oh!

kervel commented Jan 31, 2023

Uh oh!

hsolbrig commented Jan 31, 2023

Uh oh!

kervel commented Jan 31, 2023

Uh oh!

hsolbrig commented Feb 1, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hsolbrig commented Feb 1, 2023

Uh oh!

kervel commented Feb 1, 2023

Uh oh!

hsolbrig commented Feb 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hsolbrig commented Feb 2, 2023

Uh oh!

kervel commented Feb 2, 2023

Uh oh!

hsolbrig commented Feb 2, 2023

Uh oh!

kervel commented Mar 23, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cmungall left a comment

Choose a reason for hiding this comment

Uh oh!

kervel commented Mar 25, 2023

Uh oh!

capsulecorplab commented May 2, 2023

Uh oh!

kervel commented May 13, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Silvanoc commented Sep 20, 2023

Uh oh!

kervel commented Sep 26, 2023

Uh oh!

kervel commented Jan 21, 2023 •

edited

Loading

codecov-commenter commented Jan 23, 2023 •

edited by codecov bot

Loading

hsolbrig commented Jan 31, 2023 •

edited

Loading

hsolbrig commented Feb 1, 2023 •

edited

Loading

hsolbrig commented Feb 2, 2023 •

edited

Loading