You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -6,7 +6,7 @@ A vector databases is a specialized type of database that plays an essential rol
6
6
In vector databases, queries differ from traditional relational databases.
7
7
Instead of exact matches, they perform similarity searches.
8
8
When given a vector as a query, a vector database returns vectors that are "`similar`" to the query vector.
9
-
Further details on how this similarity is calculated at a high-level is provided in a <<vectordbs-similarity,later section>>.
9
+
Further details on how this similarity is calculated at a high-level is provided in a xref:api/vectordbs/understand-vectordbs.adoc#vectordbs-similarity[Vector Similarity].
10
10
11
11
Vector databases are used to integrate your data with AI models.
12
12
The first step in their usage is to load your data into a vector database.
@@ -49,6 +49,7 @@ public class SearchRequest {
49
49
private Filter.Expression filterExpression;
50
50
51
51
public static SearchRequest query(String query) { return new SearchRequest(query); }
@@ -78,24 +79,23 @@ The `similaritySearch` methods in the interface allow for retrieving documents s
78
79
* `k`: An integer that specifies the maximum number of similar documents to return. This is often referred to as a 'top K' search, or 'K nearest neighbors' (KNN).
79
80
* `threshold`: A double value ranging from 0 to 1, where values closer to 1 indicate higher similarity. By default, if you set a threshold of 0.75, for instance, only documents with a similarity above this value are returned.
80
81
* `Filter.Expression`: A class used for passing a fluent DSL (Domain-Specific Language) expression that functions similarly to a 'where' clause in SQL, but it applies exclusively to the metadata key-value pairs of a `Document`.
81
-
* `filterExpression`: An external DSL based on ANTLR4 that accepts filter expressions as strings. For example, with metadata keys like country, year, and `isActive`, you could use an expression such as
82
-
``` java
83
-
country == 'UK' && year >= 2020 && isActive == true.
84
-
```
82
+
* `filterExpression`: An external DSL based on ANTLR4 that accepts filter expressions as strings. For example, with metadata keys like country, year, and `isActive`, you could use an expression such as: `country == 'UK' && year >= 2020 && isActive == true.`
83
+
84
+
Find more information on the `Filter.Expression` in the <<metadata-filters>> section.
85
85
86
86
== Available Implementations
87
87
88
88
These are the available implementations of the `VectorStore` interface:
89
89
90
-
* Azure Vector Search [`AzureVectorStore`]: The https://learn.microsoft.com/en-us/azure/search/vector-search-overview[Azure] vector store
91
-
* Chroma [`ChromaVectorStore`]: The https://www.trychroma.com/[Chroma] vector store
92
-
* Milvus [`MilvusVectorStore`]: The https://milvus.io/[Milvus] vector store
93
-
* Neo4j [`Neo4jVectorStore`]: The https://neo4j.com/[Neo4j] vector store
94
-
* PgVector [`PgVectorStore`]: The https://github.com/pgvector/pgvector[PostgreSQL/PGVector] vector store
95
-
* Pinecone: https://www.pinecone.io/[PineCone] vector store
96
-
* Redis [`RedisVectorStore`]: The https://redis.io/[Redis] vector store
97
-
* Simple Vector Store [`SimpleVectorStore`]: A simple implementation of persistent vector storage, good for educational purposes
98
-
* Weaviate [`WeaviateVectorStore`] The https://weaviate.io/[Weaviate] vector store
90
+
* xref:api/vectordbs/azure.adoc[ Azure Vector Search] - The https://learn.microsoft.com/en-us/azure/search/vector-search-overview[Azure] vector store.
91
+
* xref:api/vectordbs/chroma.adoc[ChromaVectorStore] - The https://www.trychroma.com/[Chroma] vector store.
92
+
* xref:api/vectordbs/milvus.adoc[MilvusVectorStore] - The https://milvus.io/[Milvus] vector store.
93
+
* xref:api/vectordbs/neo4j.adoc[Neo4jVectorStore] - The https://neo4j.com/[Neo4j] vector store.
94
+
* xref:api/vectordbs/pgvector.adoc[PgVectorStore] - The https://github.com/pgvector/pgvector[PostgreSQL/PGVector] vector store.
* xref:api/vectordbs/redis.adoc[RedisVectorStore] - The https://redis.io/[Redis] vector store.
97
+
* xref:api/vectordbs/weaviate.adoc[WeaviateVectorStore] - The https://weaviate.io/[Weaviate] vector store.
98
+
* link:https://github.com/spring-projects/spring-ai/blob/main/spring-ai-core/src/main/java/org/springframework/ai/vectorstore/SimpleVectorStore.java[SimpleVectorStore] - A simple implementation of persistent vector storage, good for educational purposes.
99
99
100
100
More implementations may be supported in future releases.
101
101
@@ -137,7 +137,7 @@ Later, when a user question is passed into the AI model, a similarity search is
137
137
138
138
Additional options can be passed into the `similaritySearch` method to define how many documents to retrieve and a threshold of the similarity search.
139
139
140
-
== Metadata Filters
140
+
== Metadata Filters [[metadata-filters]]
141
141
142
142
This section describes various filters that you can use against the results of a query.
For example, the following image depicts a two-dimensional vector stem:[\vec{a}] in the cartesian coordinate system pictured as an arrow.
214
-
215
-
image::vector_2d_coordinates.png[]
216
-
217
-
The head of the vector stem:[\vec{a}] is at the point stem:[(a_1, a_2)].
218
-
The *x* coordinate value is stem:[a_1] and the *y* coordinate value is stem:[a_2]. The coordinates are also referred to as the components of the vector.
219
-
220
-
[[vectordbs-similarity]]
221
-
== Similarity
222
-
223
-
Several mathematical formulas can be used to determine if two vectors are similar.
224
-
225
-
One of the most intuitive to visualize and understand is cosine similarity.
226
-
227
-
Consider the following images that show three sets of graphs:
228
-
229
-
image::vector_similarity.png[]
230
-
231
-
The vectors stem:[\vec{A}] and stem:[\vec{B}] are considered similar, when they are pointing close to each other, as in the first diagram.
232
-
The vectors are considered unrelated when pointing perpendicular to each other and opposite when they point away from each other.
233
-
234
-
The angle between them, stem:[\theta], is a good measure of their similarity.
235
-
How can the angle stem:[\theta] be computed?
236
-
237
-
We are all familiar with the https://en.wikipedia.org/wiki/Pythagorean_theorem#History[Pythagorean Theorem].
238
-
239
-
image:pythagorean-triangle.png[]
240
-
241
-
What about when the angle between *a* and *b* is not 90 degrees?
242
-
243
-
Enter the https://en.wikipedia.org/wiki/Law_of_cosines[Law of cosines].
244
-
245
-
246
-
.Law of Cosines
247
-
****
248
-
stem:[a^2 + b^2 - 2ab\cos\theta = c^2]
249
-
****
250
-
251
-
The following image shows this approach as a vector diagram:
252
-
253
-
image:lawofcosines.png[]
254
-
255
-
256
-
The magnitude of this vector is defined in terms of its components as:
https://towardsdatascience.com/cosine-similarity-how-does-it-measure-the-similarity-maths-behind-and-usage-in-python-50ad30aad7db[Expanding this out] gives us the formula for https://en.wikipedia.org/wiki/Cosine_similarity[Cosine Similarity].
This formula works for dimensions higher than 2 or 3, though it is hard to visualize. However, https://projector.tensorflow.org/[it can be visualized to some extent].
296
-
It is common for vectors in AI/ML applications to have hundreds or even thousands of dimensions.
297
-
298
-
The similarity function in higher dimensions using the components of the vector is shown below.
299
-
It expands the two-dimensional definitions of Magnitude and Dot Product given previously to *N* dimensions by using https://en.wikipedia.org/wiki/Summation[Summation mathematical syntax].
image::vector_2d_coordinates.png[width=150, role = "right"]
5
+
6
+
Vectors have dimensionality and a direction.
7
+
For example, the following image depicts a two-dimensional vector stem:[\vec{a}] in the cartesian coordinate system pictured as an arrow.
8
+
9
+
The head of the vector stem:[\vec{a}] is at the point stem:[(a_1, a_2)].
10
+
The *x* coordinate value is stem:[a_1] and the *y* coordinate value is stem:[a_2]. The coordinates are also referred to as the components of the vector.
11
+
12
+
[[vectordbs-similarity]]
13
+
== Similarity
14
+
15
+
Several mathematical formulas can be used to determine if two vectors are similar.
16
+
One of the most intuitive to visualize and understand is cosine similarity.
17
+
Consider the following images that show three sets of graphs:
https://towardsdatascience.com/cosine-similarity-how-does-it-measure-the-similarity-maths-behind-and-usage-in-python-50ad30aad7db[Expanding this out] gives us the formula for https://en.wikipedia.org/wiki/Cosine_similarity[Cosine Similarity].
This formula works for dimensions higher than 2 or 3, though it is hard to visualize. However, https://projector.tensorflow.org/[it can be visualized to some extent].
84
+
It is common for vectors in AI/ML applications to have hundreds or even thousands of dimensions.
85
+
86
+
The similarity function in higher dimensions using the components of the vector is shown below.
87
+
It expands the two-dimensional definitions of Magnitude and Dot Product given previously to *N* dimensions by using https://en.wikipedia.org/wiki/Summation[Summation mathematical syntax].
0 commit comments