Skip to content

Commit 811d048

Browse files
committed
Clarify the PGVector store documenatation
1 parent 6122b2a commit 811d048

File tree

2 files changed

+124
-71
lines changed

2 files changed

+124
-71
lines changed

spring-ai-docs/src/main/antora/modules/ROOT/pages/api/vectordbs/pgvector.adoc

Lines changed: 122 additions & 69 deletions
Original file line numberDiff line numberDiff line change
@@ -2,19 +2,17 @@
22

33
This section walks you through setting up the PGvector `VectorStore` to store document embeddings and perform similarity searches.
44

5-
== What is PGvector?
6-
75
link:https://github.com/pgvector/pgvector[PGvector] is an open-source extension for PostgreSQL that enables storing and searching over machine learning-generated embeddings. It provides different capabilities that let users identify both exact and approximate nearest neighbors. It is designed to work seamlessly with other PostgreSQL features, including indexing and querying.
86

9-
=== Prerequisites
7+
== Prerequisites
108

11-
1. OpenAI Account: Create an account at link:https://platform.openai.com/signup[OpenAI Signup] and generate the token at link:https://platform.openai.com/account/api-keys[API Keys].
9+
First you need an access to PostgreSQL instance with enabled `vector`, `hstore` and `uuid-ossp` extensions.
1210

13-
2. Access to PostgreSQL instance with the following configurations
11+
TIP: The <<appendix_a,setup local Postgres/PGVector>> appendix shows how to set up a DB locally with a Docker container.
1412

15-
The <<appendix_a,setup local Postgres/PGVector>> appendix shows how to set up a DB locally with a Docker container.
13+
On startup, the `PgVectorStore` will attempt to install the required database extensions and create the required `vector_store` table with an index.
1614

17-
On startup, the `PgVectorStore` will attempt to install the required database extensions and create the required `vector_store` table with an index. Optionally, you can do this manually like so:
15+
Optionally, you can do this manually like so:
1816

1917
[sql]
2018
----
@@ -26,60 +24,41 @@ CREATE TABLE IF NOT EXISTS vector_store (
2624
id uuid DEFAULT uuid_generate_v4() PRIMARY KEY,
2725
content text,
2826
metadata json,
29-
embedding vector(1536)
27+
embedding vector(1536) // 1536 is the default embedding dimension
3028
);
3129
3230
CREATE INDEX ON vector_store USING HNSW (embedding vector_cosine_ops);
3331
----
3432

35-
== Configuration
36-
37-
To set up `PgVectorStore`, you need to provide (via `application.yaml`) configurations to your PostgreSQL database.
38-
39-
Additionally, you'll need to provide your OpenAI API Key. Set it as an environment variable like so:
40-
41-
[source,bash]
42-
----
43-
export SPRING_AI_OPENAI_API_KEY='Your_OpenAI_API_Key'
44-
----
45-
46-
== Repository
47-
48-
To acquire Spring AI artifacts, declare the Spring Snapshot repository:
33+
TIP: replace the `1536` with the actual embedding dimension if you are using a different dimension.
4934

50-
[source,xml]
51-
----
52-
<repository>
53-
<id>spring-snapshots</id>
54-
<name>Spring Snapshots</name>
55-
<url>https://repo.spring.io/snapshot</url>
56-
<releases>
57-
<enabled>false</enabled>
58-
</releases>
59-
</repository>
60-
----
35+
Next if required, an API key for the xref:api/embeddings.adoc#available-implementations[EmbeddingClient] to generate the embeddings stored by the `PgVectorStore`.
6136

6237
== Dependencies
6338

64-
Add these dependencies to your project:
65-
66-
* PostgreSQL connection and `JdbcTemplate` auto-configuration.
39+
Then add the PgVectorStore boot starter dependency to your project:
6740

6841
[source,xml]
6942
----
7043
<dependency>
71-
<groupId>org.springframework.boot</groupId>
72-
<artifactId>spring-boot-starter-jdbc</artifactId>
44+
<groupId>org.springframework.ai</groupId>
45+
<artifactId>spring-ai-pgvector-store-spring-boot-starter</artifactId>
7346
</dependency>
47+
----
7448

75-
<dependency>
76-
<groupId>org.postgresql</groupId>
77-
<artifactId>postgresql</artifactId>
78-
<scope>runtime</scope>
79-
</dependency>
49+
or to your Gradle `build.gradle` build file.
50+
51+
[source,groovy]
8052
----
53+
dependencies {
54+
implementation 'org.springframework.ai:spring-ai-pgvector-store-spring-boot-starter'
55+
}
56+
----
57+
58+
The Vector Store, also requires an `EmbeddingClient` instance to calculate embeddings for the documents.
59+
You can pick one of the available xref:api/embeddings.adoc#available-implementations[EmbeddingClient Implementations].
8160

82-
* OpenAI: Required for calculating embeddings.
61+
For example to use the xref:api/embeddings/openai-embeddings.adoc[OpenAI EmbeddingClient] add the following dependency to your project:
8362

8463
[source,xml]
8564
----
@@ -89,23 +68,20 @@ Add these dependencies to your project:
8968
</dependency>
9069
----
9170

92-
* PGvector
71+
or to your Gradle `build.gradle` build file.
9372

94-
[source,xml]
73+
[source,groovy]
9574
----
96-
<dependency>
97-
<groupId>org.springframework.ai</groupId>
98-
<artifactId>spring-ai-pgvector-store</artifactId>
99-
</dependency>
75+
dependencies {
76+
implementation 'org.springframework.ai:spring-ai-openai-spring-boot-starter'
77+
}
10078
----
10179

10280
TIP: Refer to the xref:getting-started.adoc#dependency-management[Dependency Management] section to add the Spring AI BOM to your build file.
81+
Refer to the xref:getting-started.adoc#repositories[Repositories] section to add Milestone and/or Snapshot Repositories to your build file.
10382

104-
== Sample Code
105-
106-
To configure `PgVectorStore` in your application, you can use the following setup:
107-
108-
Add to `application.yml` (using your DB credentials):
83+
To connect to and configure the `PgVectorStore`, you need to provide access details for your instance.
84+
A simple configuration can either be provided via Spring Boot's `application.yml`
10985

11086
[yml]
11187
----
@@ -114,9 +90,63 @@ spring:
11490
url: jdbc:postgresql://localhost:5432/postgres
11591
username: postgres
11692
password: postgres
93+
ai:
94+
vectorstore:
95+
pgvector:
96+
index-type: HNSW
97+
distance-type: COSINE_DISTANCE
98+
dimension: 1536
11799
----
118100

119-
Integrate with OpenAI's embeddings by adding the Spring Boot OpenAI Starter to your project. This provides you with an implementation of the Embeddings client:
101+
TIP: Check the list of xref:#pgvector-properties[configuration parameters] to learn about the default values and configuration options.
102+
103+
Now you can Auto-wire the PgVector Store in your application and use it
104+
105+
[source,java]
106+
----
107+
@Autowired VectorStore vectorStore;
108+
109+
// ...
110+
111+
List <Document> documents = List.of(
112+
new Document("Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!!", Map.of("meta1", "meta1")),
113+
new Document("The World is Big and Salvation Lurks Around the Corner"),
114+
new Document("You walk forward facing the past and you turn back toward the future.", Map.of("meta2", "meta2")));
115+
116+
// Add the documents to PGVector
117+
vectorStore.add(List.of(document));
118+
119+
// Retrieve documents similar to a query
120+
List<Document> results = vectorStore.similaritySearch(SearchRequest.query("Spring").withTopK(5));
121+
----
122+
123+
== Manual Configuration
124+
125+
Instead of using the Spring Boot auto-configuration, you can manually configure the `PgVectorStore`.
126+
For this you need to add the PostgreSQL connection and `JdbcTemplate` auto-configuration dependencies to your project:
127+
128+
[source,xml]
129+
----
130+
<dependency>
131+
<groupId>org.springframework.boot</groupId>
132+
<artifactId>spring-boot-starter-jdbc</artifactId>
133+
</dependency>
134+
135+
<dependency>
136+
<groupId>org.postgresql</groupId>
137+
<artifactId>postgresql</artifactId>
138+
<scope>runtime</scope>
139+
</dependency>
140+
141+
<dependency>
142+
<groupId>org.springframework.ai</groupId>
143+
<artifactId>spring-ai-pgvector-store</artifactId>
144+
</dependency>
145+
----
146+
147+
TIP: Refer to the xref:getting-started.adoc#dependency-management[Dependency Management] section to add the Spring AI BOM to your build file.
148+
149+
To configure PgVector in your application, you can use the following setup:
120150

121151
[source,java]
122152
----
@@ -126,31 +156,54 @@ public VectorStore vectorStore(JdbcTemplate jdbcTemplate, EmbeddingClient embedd
126156
}
127157
----
128158

129-
In your main code, create some documents:
159+
== Metadata filtering
130160

131-
[source,java]
132-
----
133-
List<Document> documents = List.of(
134-
new Document("Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!!", Map.of("meta1", "meta1")),
135-
new Document("The World is Big and Salvation Lurks Around the Corner"),
136-
new Document("You walk forward facing the past and you turn back toward the future.", Map.of("meta2", "meta2")));
137-
----
161+
You can leverage the generic, portable link:https://docs.spring.io/spring-ai/reference/api/vectordbs.html#_metadata_filters[metadata filters] with the PgVector store.
138162

139-
Add the documents to your vector store:
163+
For example, you can use either the text expression language:
140164

141165
[source,java]
142166
----
143-
vectorStore.add(List.of(document));
167+
vectorStore.similaritySearch(
168+
SearchRequest.defaults()
169+
.withQuery("The World")
170+
.withTopK(TOP_K)
171+
.withSimilarityThreshold(SIMILARITY_THRESHOLD)
172+
.withFilterExpression("author in ['john', 'jill'] && article_type == 'blog'"));
144173
----
145174

146-
And finally, retrieve documents similar to a query:
175+
or programmatically using the `Filter.Expression` DSL:
147176

148177
[source,java]
149178
----
150-
List<Document> results = vectorStore.similaritySearch(SearchRequest.query("Spring").withTopK(5));
179+
FilterExpressionBuilder b = new FilterExpressionBuilder();
180+
181+
vectorStore.similaritySearch(SearchRequest.defaults()
182+
.withQuery("The World")
183+
.withTopK(TOP_K)
184+
.withSimilarityThreshold(SIMILARITY_THRESHOLD)
185+
.withFilterExpression(b.and(
186+
b.in("john", "jill"),
187+
b.eq("article_type", "blog")).build()));
151188
----
152189

153-
If all goes well, you should retrieve the document containing the text "Spring AI rocks!!".
190+
NOTE: These filter expressions are converted into the equivalent PgVector filters.
191+
192+
[[pgvector-properties]]
193+
== PgVectorStore properties
194+
195+
You can use the following properties in your Spring Boot configuration to customize the PGVector vector store.
196+
197+
[cols="2,5,1"]
198+
|===
199+
|Property| Description | Default value
200+
201+
|`spring.ai.vectorstore.pgvector.index-type`| Nearest neighbor search index type. Options are `NONE` - exact nearest neighbor search, `IVFFlat` - index divides vectors into lists, and then searches a subset of those lists that are closest to the query vector. It has faster build times and uses less memory than HNSW, but has lower query performance (in terms of speed-recall tradeoff). `HNSW` - creates a multilayer graph. It has slower build times and uses more memory than IVFFlat, but has better query performance (in terms of speed-recall tradeoff). There’s no training step like IVFFlat, so the index can be created without any data in the table.| HNSW
202+
|`spring.ai.vectorstore.pgvector.distance-type`| Search distance type. Defaults to `COSINE_DISTANCE`. But if vectors are normalized to length 1, you can use `EUCLIDEAN_DISTANCE` or `NEGATIVE_INNER_PRODUCT` for best performance.| COSINE_DISTANCE
203+
|`spring.ai.vectorstore.pgvector.dimension`| Embeddings dimension. If not specified explicitly the PgVectorStore will retrieve the dimensions form the provided `EmbeddingClient`. Dimensions are set to the embedding column the on table creation. If you change the dimensions your would have to to re-create the vector_store table as well. | -
204+
|spring.ai.vectorstore.pgvector.remove-existing-vector-store-table| Deletes the existing `vector_store` table on start up. | false
205+
|===
206+
154207

155208
== Run Postgres & PGVector DB locally
156209

spring-ai-spring-boot-autoconfigure/src/main/java/org/springframework/ai/autoconfigure/vectorstore/pgvector/PgVectorStoreAutoConfiguration.java

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -17,9 +17,9 @@
1717
package org.springframework.ai.autoconfigure.vectorstore.pgvector;
1818

1919
import javax.sql.DataSource;
20+
2021
import org.springframework.ai.embedding.EmbeddingClient;
2122
import org.springframework.ai.vectorstore.PgVectorStore;
22-
import org.springframework.ai.vectorstore.VectorStore;
2323
import org.springframework.boot.autoconfigure.AutoConfiguration;
2424
import org.springframework.boot.autoconfigure.condition.ConditionalOnClass;
2525
import org.springframework.boot.autoconfigure.condition.ConditionalOnMissingBean;
@@ -38,7 +38,7 @@ public class PgVectorStoreAutoConfiguration {
3838

3939
@Bean
4040
@ConditionalOnMissingBean
41-
public VectorStore vectorStore(JdbcTemplate jdbcTemplate, EmbeddingClient embeddingClient,
41+
public PgVectorStore vectorStore(JdbcTemplate jdbcTemplate, EmbeddingClient embeddingClient,
4242
PgVectorStoreProperties properties) {
4343

4444
return new PgVectorStore(jdbcTemplate, embeddingClient, properties.getDimensions(),

0 commit comments

Comments
 (0)