.Net MEVD: Port the Pinecone connector to use Pinecone.Client #10788

adamsitnik · 2025-03-04T16:29:23Z

This PR ports the Pinecone connector to use Pinecone.Client, adds a lot of integration tests that use local emulator.

It does not implement the new LINQ filter, so it just contributes to #10415.

It also provides a lot of conformance tests, so it contributes to #10194

…cal emulator behind the scenes

…nc and DeleteCollectionAsync and Upsert*Async

…o pineconeOfficial # Conflicts: # dotnet/SK-dotnet.sln # dotnet/src/Connectors/Connectors.Memory.Pinecone/PineconeVectorStoreRecordCollection.cs

- Collection conformance tests: CRD for the collection itself - Record conformance tests: CRUD for single record - implement and fix them for Redis, PostgreSQL and SQL Server as well - VectorizedSearchAsync conformance tests

roji

Really nice to see this @adamsitnik, thanks ❤️

I have various thoughts on how to make the integration tests a bit better, but nothing is really important there. At some point I'll probably just make a pass and reorganize a bit - we shouldn't get delayed by it too much.

roji · 2025-03-11T20:30:52Z

dotnet/src/Connectors/Connectors.Memory.Pinecone/PineconeVectorStoreRecordCollection.cs

-        this._index ??= await this._pineconeClient.GetIndex(indexName, cancellationToken).ConfigureAwait(false);
+        if (this._indexClient is null)
+        {
+#if NON_PUBLISH


Rather than a different build, shouldn't we just allow uesrs to set the host on PineconeVectorStoreOptions, and use that here? It'd be null by default (so good for production), but some users may want to actually use the emulator themselves with the connector, and would set PineconeVectorStoreOptions.Host for that?

I think that the best solution would be to fix it on the Pinecone side. Let me fork their code and get back to you.

roji · 2025-03-11T20:31:39Z

dotnet/src/Connectors/Connectors.Memory.Pinecone/PineconeVectorStoreRecordCollection.cs

+            {
+                await this.CreateCollectionAsync(cancellationToken).ConfigureAwait(false);
+            }
+            catch (VectorStoreOperationException ex) when (ex.InnerException is PineconeApiException apiEx && apiEx.InnerException is ConflictError)


I know you're not a fan of pattern matching but:

Suggested change

catch (VectorStoreOperationException ex) when (ex.InnerException is PineconeApiException apiEx && apiEx.InnerException is ConflictError)

catch (VectorStoreOperationException ex) when (ex.InnerException is PineconeApiException { InnerException: ConflictError })

roji · 2025-03-11T20:34:04Z

dotnet/src/Connectors/Connectors.Memory.Pinecone/PineconeVectorStoreRecordCollection.cs

-        // Map the results.
+        Sdk.QueryRequest request = new()
+        {
+            TopK = (uint)(options.Top + options.Skip),


Is this correct? Should it really be the sum of Top and Skip?

The API does not provide a possibility to skip, so we ask for Top + Skip many records, and then manually skip Skip-many records. I know it's not optimal, but the API does not expose anything for skipping.

public record QueryRequest { [JsonPropertyName("namespace")] public string? Namespace { get; set; } [JsonPropertyName("topK")] public required uint TopK { get; set; } [JsonPropertyName("filter")] public Metadata? Filter { get; set; } [JsonPropertyName("includeValues")] public bool? IncludeValues { get; set; } [JsonPropertyName("includeMetadata")] public bool? IncludeMetadata { get; set; } [JsonPropertyName("queries")] public IEnumerable<QueryVector>? Queries { get; set; } [JsonPropertyName("vector")] public ReadOnlyMemory<float>? Vector { get; set; } [JsonPropertyName("sparseVector")] public SparseValues? SparseVector { get; set; } [JsonPropertyName("id")] public string? Id { get; set; }

Source: https://github.com/pinecone-io/pinecone-dotnet-client/blob/f2f39c9b87ac7c1f939a0c03dfa18ba60a66ad0c/src/Pinecone/Index/Requests/QueryRequest.cs#L9

If this isn't supported, then it might be better to simply throw if Skip is non-zero... Skip and Top are typically meant for pagination, where the whole point is to not bring back a ton of data from the database, but to read a window of result each time. This sort of client-side implementation means that as the window moves forward, we each time continue to fetch all the records up to the window, which can be a lot of data... So this introduces an undiscoverable performance pit of failure...

Given that users can always just implement this themselves (just do Skip + TopK themselves and pass the result via Top), I'd err on forcing users to do this extra explicitly, so they're 100% aware of the big fetch (and the code expresses it). I get that users would typically rather have everything "just work", but at the end of the day our main job here is to expose database capabilities, rather than attempt to fill in gaps in the database... FWIW I suspect that Pinecone will add a Skip feature sooner or later - I saw many databases go through this phase where they didn't have it and then added it.

roji · 2025-03-11T20:35:34Z

dotnet/src/Connectors/Connectors.Memory.Redis/RedisJsonVectorStoreRecordCollection.cs

+        {
+            await this.RunOperationAsync("FT.DROPINDEX", () => this._database.FT().DropIndexAsync(this._collectionName)).ConfigureAwait(false);
+        }
+        catch (VectorStoreOperationException ex) when (ex.InnerException is RedisServerException innerEx && innerEx.Message.Contains("Unknown Index name"))


Just confirming that there's no better way to identify this than shady string contains?

roji · 2025-03-11T20:42:10Z

dotnet/src/VectorDataIntegrationTests/VectorDataIntegrationTests/CRUD/BatchConformanceTests.cs

@@ -66,4 +66,65 @@ await this.ExecuteAsync(async collection =>
            Assert.Equal("keys", ex.ParamName);
        });
    }
+
+    [ConditionalFact]
+    public Task CanInsertUpdateAndDelete_WithVectors()


We can align naming conventions later, but I've gone with snake_case for the tests (see the filter tests for example); with the long sentences that test methods sometimes get to, that seems good for readability.

Note that this differs from the convention in this repo, but the vector data tests will be moving out to Microsoft.Extensions anyway later.

roji · 2025-03-11T20:51:32Z

dotnet/src/VectorDataIntegrationTests/VectorDataIntegrationTests/CRUD/ConformanceTestsBase.cs

@@ -14,13 +16,16 @@ public abstract class ConformanceTestsBase<TKey, TRecord>(VectorStoreFixture fix

    protected virtual VectorStoreRecordDefinition? GetRecordDefinition() => null;

-    protected async Task ExecuteAsync(Func<IVectorStoreRecordCollection<TKey, TRecord>, Task> test)
+    protected async Task ExecuteAsync(Func<IVectorStoreRecordCollection<TKey, TRecord>, Task> test, bool createCollection = true)


It seems this creates and drops a collection in many cases where that's not actually required by the test, e.g. everything in BatchConformanceTests. Unless there's a good reason, we should try to only create/drop collections when that's necessary - it's a slow process, and in some data stores there are some asynchronicity issues (where the collection doesn't yet appear although the call returned etc.). I know it might cause a bit more headaches around test isolation (leftover data inserted from one test may be visible by another), but if we're careful that should be doable.

Basically all tests which just work on data (search, CRUD...) should ideally work on one collection that gets created once at the beginning, with a single schema and test data - that makes things very fast and easy (I wrote VectorStoreCollectionFixture to provide common infra to make that simple).

What do you think @adamsitnik?

So far I was lucky to work with DBs where creating the collections is very fast, so I've not run into the problem you are trying to solve.

I know it might cause a bit more headaches around test isolation (leftover data inserted from one test may be visible by another)

We could mitigate that by generating more random Keys (as of now all of them start from seed equal 1 and just increment it).

There are few things about VectorStoreCollectionFixture that I don't like:

it requires me to create a derived type for every test class (I know that I can reuse the fixture types)

when looking at test, I don't see what records exactly are being used (I need to go to VectorStoreCollectionFixture.BuildTestData to check it)

it pre-inserts some records on my behalf, which is something I would like to avoid when testing CRUD

I'll take a look and see what I can do about it.

it requires me to create a derived type for every test class (I know that I can reuse the fixture types)

You mean the concrete fixture derived type? Don't we need some derived fixture type in any case?

it pre-inserts some records on my behalf, which is something I would like to avoid when testing CRUD

Oh this should be completely optional, only if you need it (if it's not completely optional, we should change things so that it is).

roji · 2025-03-11T20:53:59Z

dotnet/src/VectorDataIntegrationTests/VectorDataIntegrationTests/CRUD/RecordConformanceTests.cs

+    : ConformanceTestsBase<TKey, SimpleModel<TKey>>(fixture) where TKey : notnull
+{
+    [ConditionalFact]
+    public async Task ReadingAndDeletingNonExistingRecordDoesNotThrow()


I'd prefer a bit more separate where each test exercises one thing, if possible (this one exercises both Get and Delete, the one below even more things).

roji · 2025-03-11T20:55:09Z

...src/VectorDataIntegrationTests/VectorDataIntegrationTests/CRUD/CollectionConformanceTests.cs

+
+namespace VectorDataSpecificationTests.CRUD;
+
+public class CollectionConformanceTests<TKey>(VectorStoreFixture fixture)


nit: collection management generally isn't considered CRUD (the latter usually refers to operations on the rows/records inside).

It's a resurrected #10788 built on top of #10944 which I hope will get merged first. fixes #10415 --------- Co-authored-by: westey <164392973+westey-m@users.noreply.github.com>

adamsitnik added 5 commits February 27, 2025 19:08

introduce a new project for integration tests of Pinecone that use lo…

1d2765b

…cal emulator behind the scenes

get the test to work

9f03e45

port ListCollectionsAsync, CollectionExistsAsync, CreateCollectionAsy…

acd2feb

…nc and DeleteCollectionAsync and Upsert*Async

port the mappers and search filter logic

4a37c60

polishing after reading the code again

cef3358

markwallace-microsoft added .NET Issue or Pull requests regarding .NET code kernel Issues or pull requests impacting the core kernel memory labels Mar 4, 2025

adamsitnik had a problem deploying to integration March 4, 2025 16:29 — with GitHub Actions Failure

adamsitnik added 2 commits March 10, 2025 11:04

Merge remote-tracking branch 'upstream/feature-vector-data-preb1' int…

343958f

…o pineconeOfficial # Conflicts: # dotnet/SK-dotnet.sln # dotnet/src/Connectors/Connectors.Memory.Pinecone/PineconeVectorStoreRecordCollection.cs

port the unit test project, make sure all of them pass

9cd429c

adamsitnik had a problem deploying to integration March 10, 2025 10:15 — with GitHub Actions Failure

adamsitnik added 2 commits March 10, 2025 17:10

remove all usages of Pinecone.NET

6918cfa

tests:

9911ee5

- Collection conformance tests: CRD for the collection itself - Record conformance tests: CRUD for single record - implement and fix them for Redis, PostgreSQL and SQL Server as well - VectorizedSearchAsync conformance tests

adamsitnik temporarily deployed to integration March 10, 2025 18:09 — with GitHub Actions Inactive

adamsitnik marked this pull request as ready for review March 10, 2025 18:40

adamsitnik requested a review from a team as a code owner March 10, 2025 18:40

adamsitnik mentioned this pull request Mar 11, 2025

.Net: Behavior of collection deletion #10882

Closed

adamsitnik requested review from roji and westey-m March 11, 2025 12:00

westey-m deleted the branch microsoft:feature-vector-data-preb1 March 11, 2025 18:26

westey-m closed this Mar 11, 2025

roji reviewed Mar 11, 2025

View reviewed changes

adamsitnik mentioned this pull request Mar 13, 2025

.Net MEVD: Port the Pinecone connector to use Pinecone.Client #10952

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

.Net MEVD: Port the Pinecone connector to use Pinecone.Client #10788

.Net MEVD: Port the Pinecone connector to use Pinecone.Client #10788

Uh oh!

adamsitnik commented Mar 4, 2025 •

edited

Loading

Uh oh!

roji left a comment

Uh oh!

roji Mar 11, 2025

Uh oh!

adamsitnik Mar 12, 2025

Uh oh!

roji Mar 11, 2025

Uh oh!

roji Mar 11, 2025

Uh oh!

adamsitnik Mar 12, 2025

Uh oh!

roji Mar 13, 2025

Uh oh!

roji Mar 11, 2025

Uh oh!

roji Mar 11, 2025

Uh oh!

roji Mar 11, 2025

Uh oh!

adamsitnik Mar 12, 2025

Uh oh!

roji Mar 13, 2025

Uh oh!

roji Mar 11, 2025

Uh oh!

roji Mar 11, 2025

Uh oh!

Uh oh!

	catch (VectorStoreOperationException ex) when (ex.InnerException is PineconeApiException apiEx && apiEx.InnerException is ConflictError)
	catch (VectorStoreOperationException ex) when (ex.InnerException is PineconeApiException { InnerException: ConflictError })


		namespace VectorDataSpecificationTests.CRUD;

		public class CollectionConformanceTests<TKey>(VectorStoreFixture fixture)

.Net MEVD: Port the Pinecone connector to use Pinecone.Client #10788

.Net MEVD: Port the Pinecone connector to use Pinecone.Client #10788

Uh oh!

Conversation

adamsitnik commented Mar 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

roji left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

adamsitnik commented Mar 4, 2025 •

edited

Loading