Skip to content

.Net: [MEVD] Updated Nullable handling in VectorStoreRecordModelBuilder #11543

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

dmytrostruk
Copy link
Member

Motivation and Context

Currently VectorStoreRecordModelBuilder handles the case when SupportedDataPropertyTypes has a non-nullable value type (e.g. bool), but actual model contains nullable value type (bool?). This allows to simplify SupportedDataPropertyTypes configuration by specifying only non-nullable version of a type.

This PR contains changes for the following cases that are not supported at the moment:

  1. SupportedDataPropertyTypes = bool?, PropertyType = bool? - this one is simple, if nullable type is listed as supported, it should be possible to have it in record model.
  2. SupportedDataPropertyTypes = bool?, PropertyType = bool - this one is opposite to the default case. If nullable type is listed as supported, we should allow non-nullable type in record model.
  3. SupportedEnumerableDataPropertyElementTypes = bool?, PropertyType = List<bool> - this one is the same as previous case, but for enumerable types.
  4. SupportedEnumerableDataPropertyElementTypes = bool, PropertyType = List<bool?> - this one is the same as default case, but for enumerable types.

Also added a VectorData.UnitTests project in order to be able to test abstraction logic without integration tests.

Contribution Checklist

@dmytrostruk dmytrostruk self-assigned this Apr 14, 2025
@dmytrostruk dmytrostruk requested a review from a team as a code owner April 14, 2025 16:05
@markwallace-microsoft markwallace-microsoft added .NET Issue or Pull requests regarding .NET code kernel Issues or pull requests impacting the core kernel labels Apr 14, 2025
@roji
Copy link
Member

roji commented Apr 14, 2025

@dmytrostruk to help me understand, what exact scenarios aren't working before this PR? The current logic "unwraps" Nullable<T> before validating; so when the model contains bool?, that should pass validation since SupportedDataPropertyTypes has bool.

Also added a VectorData.UnitTests project in order to be able to test abstraction logic without integration tests.

That's great, it's indeed good to start having test coverage for the model builder.

@dmytrostruk
Copy link
Member Author

@dmytrostruk to help me understand, what exact scenarios aren't working before this PR?

@roji Sure, I described the scenarios in PR description, but I will provide more details here.

What currently works:

  • If SupportedDataPropertyTypes has bool, we allow bool or bool? in the record model.
  • If SupportedEnumerableDataPropertyElementTypes has bool, we allow {Collection}<bool> in the record model.

What doesn't work:

  • If SupportedDataPropertyTypes has bool?, and the record model has bool? - the exception is thrown that the type is not supported. It is happening because in the logic we unwrap Nullable and use underlying type (bool), while SupportedDataPropertyTypes only mentions bool? and not bool.
  • If SupportedDataPropertyTypes has bool?, and the record model has bool - the exception is thrown for the same reason as previous scenario. In my opinion, if we support bool -> bool?, then we also need to support bool? -> bool.
  • If SupportedEnumerableDataPropertyElementTypes has bool, and the record model has {Collection}<bool?> - the exception is thrown. So, it works with combination bool -> bool?, but not with bool -> List<bool?>.
  • If SupportedEnumerableDataPropertyElementTypes has bool?, and the record model has {Collection}<bool> - the exception is thrown. Same case as previous one, just inverted.

Hope that provides more clarity. I was thinking if we actually should support all of these cases but taking into account that we are already doing some Nullable unwrapping and support some cases, I think we should support either all of the combinations or none of them. Otherwise, from the API point of view, it's not clear why bool in SupportedDataPropertyTypes also includes nullable version of this type, while bool? doesn't work neither with bool nor bool?. The same is related to enumerable types.

Please let me know if that makes sense. Thanks!

@roji
Copy link
Member

roji commented Apr 14, 2025

If SupportedDataPropertyTypes has bool?

Why would we need to allow that? In other words, why not just require connectors to specify non-nullable types in SupportedDataPropertyTypes, and then unwrap internally wherever needed? That was the intention in any case - SupportedDataPropertyTypes is just a way for connectors to specify to the model builder what types are allowed etc.

If SupportedEnumerableDataPropertyElementTypes has bool, and the record model has {Collection}<bool?> - the exception is thrown. So, it works with combination bool -> bool?, but not with bool -> List<bool?>.

I see. Though this may be something we'd need to think about a bit more: if a connector supports string?, that doesn't in itself mean it supports List<string?> (which would mean that it supports null within a list of strings). I suspect that support here would be quite variable across connectors...

@dmytrostruk
Copy link
Member Author

Why would we need to allow that? In other words, why not just require connectors to specify non-nullable types in SupportedDataPropertyTypes, and then unwrap internally wherever needed?

I think that the only way how to require connectors to specify non-nullable types in SupportedDataPropertyTypes is to throw an exception when nullable type is specified, but it's an error in runtime. Not sure if there is a way how to check that at compile time (maybe some analyzer). In all cases, I can update the PR to whatever behavior we agree, but currently it allows you to specify nullable type in SupportedDataPropertyTypes and it's not working. The reason why I created this PR because I observed this behavior while I was working with Weaviate connector, so I assume connector developers may face this inconsistency in the future as well.

if a connector supports string?, that doesn't in itself mean it supports List<string?> (which would mean that it supports null within a list of strings). I suspect that support here would be quite variable across connectors...

Today, Weaviate connector supports List<T> as well as List<Nullable<T>>. Taking into account that bool -> List<bool?> case doesn't work, the only way how to make it work is to specify both bool and bool? in SupportedEnumerableDataPropertyElementTypes property (which doesn't look like something we want connector developers to do).

Maybe it would be better to remove nullable unwrapping and force developers to specify non-nullable and nullable types explicitly and perform validation strongly against specified types. A little bit more code will be required in connectors, but it's more flexible approach (you can control nullable/non-nullable single and enumerable properties separately) and also it will be clear which types are supported.

@roji
Copy link
Member

roji commented Apr 15, 2025

I think that the only way how to require connectors to specify non-nullable types in SupportedDataPropertyTypes is to throw an exception when nullable type is specified, but it's an error in runtime. Not sure if there is a way how to check that at compile time (maybe some analyzer). In all cases, I can update the PR to whatever behavior we agree, but currently it allows you to specify nullable type in SupportedDataPropertyTypes and it's not working. The reason why I created this PR because I observed this behavior while I was working with Weaviate connector, so I assume connector developers may face this inconsistency in the future as well.

Sure, I agree we can improve this - I think it makes sense to simply throw for Nullable<>. Just to put this in context: we're discussing "internal" validation of the connector, nothing more - this isn't a user-facing feature. So a simple runtime exception to let the connector maintainer know they've done something wrong seems more than sufficient - the most basic test would immediately pick this up. I definitely don't think anything here warrants complexity (or an analyzer).

Maybe it would be better to remove nullable unwrapping and force developers to specify non-nullable and nullable types explicitly and perform validation strongly against specified types. A little bit more code will be required in connectors, but it's more flexible approach (you can control nullable/non-nullable single and enumerable properties separately) and also it will be clear which types are supported.

The reason I added unwrapping in the first place, is that nullability generally isn't part of the type in the database. In other words, either the database supports nulls or it doesn't, and if does, this doesn't vary on a type-by-type basis (e.g. the database supports null for string fields but not for ints). In that sense, I think null is part of the type on the .NET side but not on the DB side.

I'm less sure what happens with collections. Not every type that's supported is supported as an element in a collection (that's why we have SupportedEnumerableDataPropertyElementTypes), and I'm not sure if it's safe to assume that if nullable int is supported as a scalar, it's necessarily also supported as an array element. For example, I can imagine a list-of-strings type (like Redis tags) not supporting nullable strings (though a scalar string is probably nullable).

Another option here (which is what I think we should seriously consider): rather than accepting a list of types, connectors can simply have their own subclass of the ModelBuilder (some already do), and simply expose a hook for property type validation which they'd override. In other words, instead of allowing connectors to pass in a list of Types, just allow them to write validation in arbitrary code (possibly providing a bit of support around detecting lists etc.). This would free us from having to think about all these questions.

In any case... Just pointing out that none of this is user-facing, and we can do any improvements here later without any breaking changes.

@dmytrostruk
Copy link
Member Author

Thanks!

In any case... Just pointing out that none of this is user-facing, and we can do any improvements here later without any breaking changes.

Taking this into account, is it okay if we temporarily mention nullable types in Weaviate SupportedEnumerableDataPropertyElementTypes to make it work for enumerable nullable types?

@roji
Copy link
Member

roji commented Apr 15, 2025

Taking this into account, is it okay if we temporarily mention nullable types in Weaviate SupportedEnumerableDataPropertyElementTypes to make it work for enumerable nullable types?

Absolutely! That sounds like a good bugfix.

BTW I do think we need to think a bit about validation and how the connector and the model builder interact around this... That's one of the reasons everything is flagged as [Experimental] in that area. But yeah, I'd rather we did this a bit later when we're under less pressure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kernel Issues or pull requests impacting the core kernel .NET Issue or Pull requests regarding .NET code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants