Skip to content

Commit 408337e

Browse files
meistermeiertzolov
authored andcommitted
Support filter expressions in Neo4j vector store.
- Move Neo4j filter converters from core to neoj4 vector store project. - Update neo4j adoc. Resolves #318
1 parent ca0d10d commit 408337e

File tree

7 files changed

+439
-69
lines changed

7 files changed

+439
-69
lines changed

spring-ai-docs/src/main/antora/modules/ROOT/pages/api/embeddings.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -146,7 +146,7 @@ public class Embedding implements ModelResult<List<Double>> {
146146
}
147147
----
148148

149-
== Available Implementations
149+
== Available Implementations [[available-implementations]]
150150

151151
Internally the various `EmbeddingClient` implementations use different low-level libraries and APIs to perform the embedding tasks. The following are some of the available implementations of the `EmbeddingClient` implementations:
152152

spring-ai-docs/src/main/antora/modules/ROOT/pages/api/vectordbs/neo4j.adoc

Lines changed: 103 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -2,77 +2,53 @@
22

33
This section walks you through setting up `Neo4jVectorStore` to store document embeddings and perform similarity searches.
44

5-
== What is Neo4j?
5+
link:https://neo4j.com[Neo4j] is an open-source NoSQL graph database.
6+
It is a fully transactional database (ACID) that stores data structured as graphs consisting of nodes, connected by relationships.
7+
Inspired by the structure of the real world, it allows for high query performance on complex data while remaining intuitive and simple for the developer.
68

7-
link:https://neo4j.com[Neo4j] is an open-source NoSQL graph database. It is a fully transactional database (ACID) that stores data structured as graphs consisting of nodes, connected by relationships. Inspired by the structure of the real world, it allows for high query performance on complex data while remaining intuitive and simple for the developer.
9+
The link:https://neo4j.com/docs/cypher-manual/current/indexes-for-vector-search/[Neo4j's Vector Search] allows users to query vector embeddings from large datasets. An embedding is a numerical representation of a data object, such as text, image, audio, or document.
10+
Embeddings can be stored on _Node_ properties and can be queried with the `db.index.vector.queryNodes()` function.
11+
Those indexes are powered by Lucene using a Hierarchical Navigable Small World Graph (HNSW) to perform a k approximate nearest neighbors (k-ANN) query over the vector fields.
812

9-
== What is Neo4j Vector Search?
13+
== Prerequisites
1014

11-
link:https://neo4j.com/docs/cypher-manual/current/indexes-for-vector-search/[Neo4j's Vector Search] got introduced in Neo4j 5.11 and was considered GA with the release of version 5.13. Embeddings can be stored on _Node_ properties and can be queried with the `db.index.vector.queryNodes()` function. Those indexes are powered by Lucene using a Hierarchical Navigable Small World Graph (HNSW) to perform a k approximate nearest neighbors (k-ANN) query over the vector fields.
12-
13-
=== Prerequisites
14-
15-
1. OpenAI Account: Create an account at link:https://platform.openai.com/signup[OpenAI Signup] and generate the token at link:https://platform.openai.com/account/api-keys[API Keys].
16-
17-
2. A running Neo4j (5.13+) instance
18-
a. link:https://hub.docker.com/_/neo4j[Docker] image _neo4j:5.15_
19-
b. link:https://neo4j.com/download/[Neo4j Desktop]
20-
c. link:https://neo4j.com/cloud/aura-free/[Neo4j Aura]
21-
d. link:https://neo4j.com/deployment-center/[Neo4j Server] instance
15+
* You would need an xref:api/embeddings.adoc#available-implementations[EmbeddingClient] to generate the embeddings stored in the `Neo4jVectorStore`.
16+
* A running Neo4j (5.13+) instance. Following options are available:
17+
** link:https://hub.docker.com/_/neo4j[Docker] image `neo4j:5.16.0`
18+
** link:https://neo4j.com/download/[Neo4j Desktop]
19+
** link:https://neo4j.com/cloud/aura-free/[Neo4j Aura]
20+
** link:https://neo4j.com/deployment-center/[Neo4j Server] instance
2221

2322
== Configuration
2423

2524
To connect to Neo4j and use the `Neo4jVectorStore`, you need to provide (e.g. via environment variables) access details for your instance.
2625

27-
Additionally, you will need to provide your OpenAI API Key. Set it as an environment variable like so:
28-
29-
[source,bash]
30-
----
31-
export SPRING_AI_OPENAI_API_KEY='Your_OpenAI_API_Key'
32-
----
33-
34-
== Repository
35-
36-
To acquire Spring AI artifacts, declare the Spring Snapshot repository:
37-
38-
[source,xml]
39-
----
40-
<repository>
41-
<id>spring-snapshots</id>
42-
<name>Spring Snapshots</name>
43-
<url>https://repo.spring.io/snapshot</url>
44-
<releases>
45-
<enabled>false</enabled>
46-
</releases>
47-
</repository>
48-
----
26+
TIP: Additionally, you will need a configured xref:api/embeddings.adoc#available-implementations[EmbeddingClient].
4927

5028
== Dependencies
5129

52-
Add these dependencies to your project:
53-
54-
* OpenAI: Required for calculating embeddings.
30+
Add the Neo4j Vector Store dependency to your project:
5531

5632
[source,xml]
5733
----
5834
<dependency>
5935
<groupId>org.springframework.ai</groupId>
60-
<artifactId>spring-ai-openai-spring-boot-starter</artifactId>
36+
<artifactId>spring-ai-neo4j-store</artifactId>
6137
<version>0.8.0-SNAPSHOT</version>
6238
</dependency>
6339
----
6440

65-
* Neo4j Vector Store
41+
or to your Gradle `build.gradle` build file.
6642

67-
[source,xml]
43+
[source,groovy]
6844
----
69-
<dependency>
70-
<groupId>org.springframework.ai</groupId>
71-
<artifactId>spring-ai-neo4j-store</artifactId>
72-
<version>0.8.0-SNAPSHOT</version>
73-
</dependency>
45+
dependencies {
46+
implementation 'org.springframework.ai:spring-ai-neo4j-store:0.8.0-SNAPSHOT'
47+
}
7448
----
7549

50+
TIP: Refer to the xref:getting-started.adoc#_dependency_management[Dependency Management] section to add Milestone and/or Snapshot Repositories to your build file.
51+
7652
== Sample Code
7753

7854
To configure `Neo4jVectorStore` in your application, you can use the following setup:
@@ -92,18 +68,19 @@ You'll need a `VectorStore` to store the embeddings. You can use the `Neo4jVecto
9268

9369
[source,java]
9470
----
71+
@Bean
72+
public EmbeddingClient embeddingClient() {
73+
// Can be any other EmbeddingClient implementation.
74+
return new OpenAiEmbeddingClient(new OpenAiApi(System.getenv("SPRING_AI_OPENAI_API_KEY")));
75+
}
76+
9577
@Bean
9678
public Driver driver() {
9779
return GraphDatabase.driver(System.getenv("SPRING_NEO4J_URI"),
9880
AuthTokens.basic(System.getenv("SPRING_NEO4J_AUTHENTICATION_USERNAME"),
9981
System.getenv("SPRING_NEO4J_AUTHENTICATION_PASSWORD")));
10082
}
10183
102-
@Bean
103-
public EmbeddingClient embeddingClient() {
104-
return new OpenAiEmbeddingClient(new OpenAiApi(System.getenv("SPRING_AI_OPENAI_API_KEY")));
105-
}
106-
10784
@Bean
10885
public VectorStore vectorStore(Driver driver, EmbeddingClient embeddingClient) {
10986
return new Neo4jVectorStore(driver, embeddingClient,
@@ -112,3 +89,77 @@ public VectorStore vectorStore(Driver driver, EmbeddingClient embeddingClient) {
11289
----
11390

11491
The `Neo4jVectorStore` is now ready to be used in your application. You can use it to store embeddings and perform similarity searches.
92+
93+
== Metadata filtering
94+
95+
You can leverage the generic, portable xref:api/vectordbs.adoc#metadata-filters[metadata filters] with Neo4j store as well.
96+
97+
For example, you can use either the text expression language:
98+
99+
[source,java]
100+
----
101+
vectorStore.similaritySearch(
102+
SearchRequest.defaults()
103+
.withQuery("The World")
104+
.withTopK(TOP_K)
105+
.withSimilarityThreshold(SIMILARITY_THRESHOLD)
106+
.withFilterExpression("author in ['john', 'jill'] && 'article_type' == 'blog'"));
107+
----
108+
109+
or programmatically using the `Filter.Expression` DSL:
110+
111+
[source,java]
112+
----
113+
FilterExpressionBuilder b = new FilterExpressionBuilder();
114+
115+
vectorStore.similaritySearch(SearchRequest.defaults()
116+
.withQuery("The World")
117+
.withTopK(TOP_K)
118+
.withSimilarityThreshold(SIMILARITY_THRESHOLD)
119+
.withFilterExpression(b.and(
120+
b.in("john", "jill"),
121+
b.eq("article_type", "blog")).build()));
122+
----
123+
124+
NOTE: Those (portable) filter expressions get automatically converted into the proprietary Neo4j `WHERE` link:https://neo4j.com/developer/cypher/filtering-query-results/[filter expressions].
125+
126+
For example, this portable filter expression:
127+
128+
```sql
129+
author in ['john', 'jill'] && 'article_type' == 'blog'
130+
```
131+
132+
is converted into the proprietary Neo4j filter format:
133+
134+
```
135+
node.`metadata.author` IN ["john","jill"] AND node.`metadata.'article_type'` = "blog"
136+
```
137+
138+
== Auto-configuration
139+
140+
Spring AI provides Spring Boot auto-configuration for the Neo4j Vector Sore.
141+
To enable it add the following dependency to your project's Maven `pom.xml` file:
142+
143+
[source, xml]
144+
----
145+
<dependency>
146+
<groupId>org.springframework.ai</groupId>
147+
<artifactId>spring-ai-neo4j-store-spring-boot-starter</artifactId>
148+
<version>0.8.0-SNAPSHOT</version>
149+
</dependency>
150+
----
151+
152+
or to your Gradle `build.gradle` build file.
153+
154+
[source,groovy]
155+
----
156+
dependencies {
157+
implementation 'org.springframework.ai:spring-ai-neo4j-store-spring-boot-starter:0.8.0-SNAPSHOT'
158+
}
159+
----
160+
161+
TIP: Additionally, you will need a configured `EmbeddingClient` bean. Refer to the xref:api/embeddings.adoc#available-implementations[EmbeddingClient] section for more information.
162+
163+
TIP: Refer to the xref:getting-started.adoc#_dependency_management[Dependency Management] section to add Milestone and/or Snapshot Repositories to your build file.
164+
165+
Now you can auto-wire the `Neo4jVectorStore` as a vector store in your application.

spring-ai-spring-boot-autoconfigure/src/main/java/org/springframework/ai/autoconfigure/vectorstore/neo4j/Neo4jVectorStoreProperties.java

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -40,39 +40,39 @@ public class Neo4jVectorStoreProperties {
4040
private String indexName = Neo4jVectorStore.DEFAULT_INDEX_NAME;
4141

4242
public String getDatabaseName() {
43-
return databaseName;
43+
return this.databaseName;
4444
}
4545

4646
public void setDatabaseName(String databaseName) {
4747
this.databaseName = databaseName;
4848
}
4949

5050
public int getEmbeddingDimension() {
51-
return embeddingDimension;
51+
return this.embeddingDimension;
5252
}
5353

5454
public void setEmbeddingDimension(int embeddingDimension) {
5555
this.embeddingDimension = embeddingDimension;
5656
}
5757

5858
public Neo4jVectorStore.Neo4jDistanceType getDistanceType() {
59-
return distanceType;
59+
return this.distanceType;
6060
}
6161

6262
public void setDistanceType(Neo4jVectorStore.Neo4jDistanceType distanceType) {
6363
this.distanceType = distanceType;
6464
}
6565

6666
public String getLabel() {
67-
return label;
67+
return this.label;
6868
}
6969

7070
public void setLabel(String label) {
7171
this.label = label;
7272
}
7373

7474
public String getEmbeddingProperty() {
75-
return embeddingProperty;
75+
return this.embeddingProperty;
7676
}
7777

7878
public void setEmbeddingProperty(String embeddingProperty) {

vector-stores/spring-ai-neo4j-store/src/main/java/org/springframework/ai/vectorstore/Neo4jVectorStore.java

Lines changed: 15 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@
2222
import org.neo4j.driver.Values;
2323
import org.springframework.ai.document.Document;
2424
import org.springframework.ai.embedding.EmbeddingClient;
25+
import org.springframework.ai.vectorstore.filter.Neo4jVectorFilterExpressionConverter;
2526
import org.springframework.beans.factory.InitializingBean;
2627
import org.springframework.util.Assert;
2728

@@ -222,6 +223,8 @@ public Neo4jVectorStoreConfig build() {
222223

223224
public static final String DEFAULT_EMBEDDING_PROPERTY = "embedding";
224225

226+
private final Neo4jVectorFilterExpressionConverter filterExpressionConverter = new Neo4jVectorFilterExpressionConverter();
227+
225228
private final Driver driver;
226229

227230
private final EmbeddingClient embeddingClient;
@@ -277,24 +280,25 @@ public Optional<Boolean> delete(List<String> idList) {
277280

278281
@Override
279282
public List<Document> similaritySearch(SearchRequest request) {
280-
if (request.getFilterExpression() != null) {
281-
throw new UnsupportedOperationException(
282-
"The [" + this.getClass() + "] doesn't support metadata filtering!");
283-
}
284-
285283
Assert.isTrue(request.getTopK() > 0, "The number of documents to returned must be greater than zero");
286284
Assert.isTrue(request.getSimilarityThreshold() >= 0 && request.getSimilarityThreshold() <= 1,
287285
"The similarity score is bounded between 0 and 1; least to most similar respectively.");
288286

289287
var embedding = Values.value(toFloatArray(this.embeddingClient.embed(request.getQuery())));
290288
try (var session = this.driver.session(this.config.sessionConfig)) {
289+
StringBuilder condition = new StringBuilder("score >= $threshold");
290+
if (request.hasFilterExpression()) {
291+
condition.append(" AND ")
292+
.append(this.filterExpressionConverter.convertExpression(request.getFilterExpression()));
293+
}
294+
String query = """
295+
CALL db.index.vector.queryNodes($indexName, $numberOfNearestNeighbours, $embeddingValue)
296+
YIELD node, score
297+
WHERE %s
298+
RETURN node, score""".formatted(condition);
299+
291300
return session
292-
.run("""
293-
CALL db.index.vector.queryNodes($indexName, $numberOfNearestNeighbours, $embeddingValue)
294-
YIELD node, score
295-
WHERE score >= $threshold
296-
RETURN node, score
297-
""",
301+
.run(query,
298302
Map.of("indexName", this.config.indexName, "numberOfNearestNeighbours", request.getTopK(),
299303
"embeddingValue", embedding, "threshold", request.getSimilarityThreshold()))
300304
.list(Neo4jVectorStore::recordToDocument);

0 commit comments

Comments
 (0)