Enhancing Elasticsearch vector store implementation #592

l-trotta · 2024-04-16T13:17:36Z

This PR provides a more performant search function for the Elasticsearch vector store and removes the bean autoconfiguration.

In depth

Autoconfiguration

The current implementation of afterPropertiesSet() automatically creates a new index with a set of properties that only works with OpenAI, or any other model that works with vectors with a dimension of 1536; users adopting other models would currently have to manually delete the index. By default, Elasticsearch automatically creates the correct index settings when it receives the first PUT request for vectors, so in our opinion there's no need for such autoconfiguration.

Search

The function used now is script_score, which can be slow for large data samples since it does a brute force comparison with all vectors using the similarity function. This PR replaces it with the approximate knn search, more performant because it only scans the closest neighbours. The similarity functions available for knn can be easily configured in the index mapping by setting the correct name.

We would also like to contribute to the documentation, should that be done in a different PR?

Thank you @JM-Lab for the original implementation, would you like to review these changes?

(Disclosure: I work for Elastic)

JM-Lab · 2024-04-17T15:38:25Z

Hi @l-trotta,

Thanks for the update. I liked how script_score was quick to implement due to available examples, but I'm pleased we're switching to KNN for better performance.

I find the automatic creation of index mappings convenient; however, for my projects, I typically need to define these mappings in advance. Could we consider adding an option to provide index mappings directly in the constructor?

Additionally, could you check if the code at ElasticsearchAiSearchFilterExpressionConverter.java needs any improvements?

I'm also looking forward to the official documentation for the Elasticsearch vector store of the Spring AI project, which I think @tzolov will address.

l-trotta · 2024-04-22T16:44:01Z

Hi @JM-Lab,

Since the index mapping can be added with createIndexMapping(), and in most cases the autoconfiguration is enough, we chose not to add the mapping to the constructor to make it easier to configure it at the start.

The filter converter looked fine to me!
Thanks again for your work.

tzolov · 2024-04-30T07:02:35Z

@l-trotta , thank you for the improvements.
Could you please rebase and update your PR after the #633 merge.

l-trotta · 2024-05-06T15:31:19Z

Updated! I'd like to explain why I removed the dense-vector-indexing property: quoting the documentation,

If true, you can search this field using the kNN search API. Defaults to true.

And since this implementation uses kNN search, setting this to false would make the data unsearchable.

ezimuel · 2024-06-10T07:16:48Z

@tzolov what's the status of this PR? Thanks.

tzolov · 2024-06-16T08:51:18Z

@l-trotta thanks for your contribution and thank you for your patience.

While reviewing the PR i've noticed that you have removed the capability to override the similarityFunction at runtime?
Is there any particular reason to reduce this flexibility? Either way it renders this ElasticsearchVectorStoreAutoConfigurationIT almost useless?
If there are not strong reasons to have this immutable perhaps we should re-enable the withSimilarityFunction setter?

tzolov

Thanks @l-trotta ,
It looks good but there few things that needs clarification/fixing. Please check my comments.
Looking forward for merging this improvement.

tzolov · 2024-06-16T09:27:09Z

spring-ai-spring-boot-autoconfigure/pom.xml

@@ -281,6 +282,12 @@
 			<optional>true</optional>
 		</dependency>

+		<dependency>


What is the reason to add the elasticsearch-java explicitly here? As it is already defined in the vector-store dependenies?

tzolov · 2024-06-16T09:29:12Z

...csearch-store/src/main/java/org/springframework/ai/vectorstore/ElasticsearchVectorStore.java

-		this.similarityFunction = COSINE_SIMILARITY_FUNCTION;
-	}
-
-	public ElasticsearchVectorStore withSimilarityFunction(String similarityFunction) {


Is it required to remove the ability to change the similarity function at runtime?
This by the way breaks the ElasticsearchVectorStoreAutoConfigurationIT tests.

l-trotta · 2024-06-17T09:53:12Z

@tzolov Thank you for the review, I'll explain my changes:

Removal of the withSimilarityFunction setter: in elasticsearch the similarity function is defined as a mapping property of the index and cannot be changed, trying to update the property in the mapping will result in the following error:

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": """Mapper for [embedding] conflicts with existing mapper:
    Cannot update parameter [similarity] from [dot_product] to [cosine]"""
      }
    ],
    "type": "illegal_argument_exception",
    "reason": """Mapper for [embedding] conflicts with existing mapper:
    Cannot update parameter [similarity] from [dot_product] to [cosine]"""
  },
  "status": 400
}

so the only way to change mapping is to create another index, and in this case a new vector store instance.

Having elasticsearch-java explicitly defined in autoconfiguration: while testing the autoconfiguration I noticed it was importing an older version of the client (8.10.4), which breaks the current implementation of the vector store since the knn query was recently fixed with version 8.13.3, so I wanted to make sure that this would be the version used, but maybe it's an issue with my local environment? If everything works without that it can be removed.

I didn't notice ElasticsearchVectorStoreAutoConfigurationIT being broken, sorry for that, I updated it.

tzolov · 2024-06-17T13:05:43Z

Thanks for the clarification @l-trotta !

It seems latest Spring Boot 3.3.x (the one we use for our auto-configs) is already at Elasticsearch Client 8.13.4.
So I guess we are fine without the forced dependencies?

Perhaps you can mention (in a separate PR) this requirement in the docs. E.g. if someone is using pre 3.3. Boot they have to add the elastic search 8.13.4 dependency to their poms?

tzolov · 2024-06-17T13:21:38Z

rebased, squashed and merged at 78f3797

tzolov · 2024-06-17T13:22:58Z

@l-trotta thank you for the great work!
Just merged the PR.
Please check if the dependencies work as expected and consider adding a NOTE in the documentation as suggested above.

l-trotta · 2024-06-17T13:32:38Z

@tzolov thank you! I'll test with an older boot version, if there could be version problems I'll open another PR for the documentation.

l-trotta · 2024-06-18T14:59:14Z

is there any estimation as for when these changes will be released?

ezimuel · 2024-08-07T10:11:07Z

@tzolov do you know when this PR will be public released? We would like to prepare some materials to promote this integration for Elasticsearch. Thanks!

tzolov added enhancement New feature or request vector store labels Apr 20, 2024

tzolov self-assigned this Apr 26, 2024

tzolov added this to the 1.0.0-M1 milestone Apr 26, 2024

l-trotta force-pushed the main branch from bf71a49 to fe12aae Compare May 6, 2024 15:24

tzolov modified the milestones: 1.0.0-M1, 1.0.0-M2 May 28, 2024

l-trotta added 9 commits May 31, 2024 16:50

knn instead of script_score, removed initialization

86650ac

only using normalized similarities, adjusted unit test

8180189

import clean

a73509b

making l2norm's distances consistent with others

0ab9981

refactor unit test

3e31e00

rebase

af5d8c1

format

5acdfec

dependency version, docs

db32740

rebase

8e08cb0

l-trotta force-pushed the main branch from fe12aae to 8e08cb0 Compare May 31, 2024 15:19

tzolov requested changes Jun 16, 2024

View reviewed changes

autoconfigure test fix

ca9fc88

tzolov closed this Jun 17, 2024

l-trotta mentioned this pull request Sep 5, 2024

Elasticsearch vector store - Wrong error reported when missing index #1316

Closed

Enhancing Elasticsearch vector store implementation #592

Enhancing Elasticsearch vector store implementation #592

Uh oh!

Conversation

l-trotta commented Apr 16, 2024

In depth

Autoconfiguration

Search

Uh oh!

JM-Lab commented Apr 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

l-trotta commented Apr 22, 2024

Uh oh!

tzolov commented Apr 30, 2024

Uh oh!

l-trotta commented May 6, 2024

Uh oh!

ezimuel commented Jun 10, 2024

Uh oh!

tzolov commented Jun 16, 2024

Uh oh!

tzolov left a comment

Choose a reason for hiding this comment

Uh oh!

tzolov Jun 16, 2024

Choose a reason for hiding this comment

Uh oh!

tzolov Jun 16, 2024

Choose a reason for hiding this comment

Uh oh!

l-trotta commented Jun 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tzolov commented Jun 17, 2024

Uh oh!

tzolov commented Jun 17, 2024

Uh oh!

tzolov commented Jun 17, 2024

Uh oh!

l-trotta commented Jun 17, 2024

Uh oh!

l-trotta commented Jun 18, 2024

Uh oh!

ezimuel commented Aug 7, 2024

Uh oh!

Uh oh!

JM-Lab commented Apr 17, 2024 •

edited

Loading

l-trotta commented Jun 17, 2024 •

edited

Loading