You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- add concurrency to store.add(..) (bc embeddingClient is slow)
- CassandraVectorStoreAutoConfiguration uses CassandraAutoConfiguration
- driver profiles for production stability+performance,
- small cleanups and naming fixes,
- main doc tidy-up
- astradb compatibility (protocol V4)
– don't create embeddings again for documents that already have them
similar to #413
Copy file name to clipboardExpand all lines: spring-ai-docs/src/main/antora/modules/ROOT/pages/api/vectordbs/apache-cassandra.adoc
+25-29Lines changed: 25 additions & 29 deletions
Original file line number
Diff line number
Diff line change
@@ -4,9 +4,9 @@ This section walks you through setting up `CassandraVectorStore` to store docume
4
4
5
5
== What is Apache Cassandra ?
6
6
7
-
link:https://cassandra.apache.org[Apache Cassandra] is a true open source distributed database reknown for scalabilityand high availability without compromising performance.
7
+
link:https://cassandra.apache.org[Apache Cassandra®] is a true open source distributed database reknown for linear scalability, proven fault-tolerance and low latency, making it the perfect platform for mission-critical transactional data.
8
8
9
-
Linear scalability, proven fault-tolerance and low latency on commodity hardware makes it the perfect platform for mission-critical data. Its Vector Similarity Search (VSS) is based on the JVector library that ensures best-in-class performance and relevancy.
9
+
Its Vector Similarity Search (VSS) is based on the JVector library that ensures best-in-class performance and relevancy.
10
10
11
11
A vector search in Apache Cassandra is done as simply as:
12
12
```
@@ -15,9 +15,13 @@ SELECT content FROM table ORDER BY content_vector ANN OF query_embedding ;
15
15
16
16
More docs on this can be read https://cassandra.apache.org/doc/latest/cassandra/getting-started/vector-search-quickstart.html[here].
17
17
18
-
The Spring AI Cassandra Vector Store is designed to work for both brand new RAG applications as well as being able to be retrofitted on top of existing data and tables. This vector store may also equally be used for non-RAG non_AI use-cases, e.g. semantic searcing in an existing database. The Vector Store will automatically create, or enhance, the schema as needed according to its configuration. If you don't want the schema modifications, configure the store with `disallowSchemaChanges`.
18
+
This Spring AI Vector Store is designed to work for both brand new RAG applications as well as being able to be retrofitted on top of existing data and tables.
19
19
20
-
== What is JVector Vector Search ?
20
+
The store can also be used for non-RAG use-cases in an existing database, e.g. semantic searches, geo-proximity searches, etc.
21
+
22
+
The store will automatically create, or enhance, the schema as needed according to its configuration. If you don't want the schema modifications, configure the store with `disallowSchemaChanges`.
23
+
24
+
== What is JVector ?
21
25
22
26
link:https://github.com/jbellis/jvector[JVector] is a pure Java embedded vector search engine.
23
27
@@ -70,13 +74,6 @@ Add these dependencies to your project:
70
74
71
75
TIP: Refer to the xref:getting-started.adoc#dependency-management[Dependency Management] section to add the Spring AI BOM to your build file.
72
76
73
-
* If for example you want to use the OpenAI modules, remember to provide your OpenAI API Key. Set it as an environment variable like so:
@@ -93,21 +90,14 @@ public VectorStore vectorStore(EmbeddingClient embeddingClient) {
93
90
}
94
91
----
95
92
96
-
NOTE: It is more convenient and preferred to create the `CassandraVectorStore` as a Bean.
97
-
But if you decide you can create it manually.
98
-
99
93
[NOTE]
100
94
====
101
-
The default configuration connects to Cassandra at localhost:9042 and will automatically create the default schema at `springframework_ai_vector.springframework_ai_vector_store`.
102
-
103
-
Please see `CassandraVectorStoreConfig.Builder` for all the configuration options.
95
+
The default configuration connects to Cassandra at `localhost:9042` and will automatically create a default schema in keyspace `springframework`, table `ai_vector_store`.
104
96
====
105
97
106
98
[NOTE]
107
99
====
108
-
The Cassandra Java Driver is easiest configured via the `application.conf` file on the classpath.
109
-
110
-
More info can be found link: https://github.com/apache/cassandra-java-driver/tree/4.x/manual/core/configuration[here].
100
+
The Cassandra Java Driver is easiest configured via an `application.conf` file on the classpath. More info https://github.com/apache/cassandra-java-driver/tree/4.x/manual/core/configuration[here].
You can leverage the generic, portable link:https://docs.spring.io/spring-ai/reference/api/vectordbs.html#_metadata_filters[metadata filters] with the CassandraVectorStore as well. Metadata fields must be configured in `CassandraVectorStoreConfig`.
141
+
You can leverage the generic, portable link:https://docs.spring.io/spring-ai/reference/api/vectordbs.html#_metadata_filters[metadata filters] with the CassandraVectorStore as well. Metadata columns must be configured in `CassandraVectorStoreConfig`.
152
142
153
143
For example, you can use either the text expression language:
154
144
@@ -173,7 +163,9 @@ vectorStore.similaritySearch(
173
163
174
164
The portable filter expressions get automatically converted into link:https://cassandra.apache.org/doc/latest/cassandra/developing/cql/index.html[CQL queries].
175
165
176
-
Metadata fields to be searchable need to be either primary key columns or SAI indexed. To do this configure the metadata field with the `SchemaColumnTags.INDEXED`.
166
+
For metadata columns to be searchable they must be either primary keys or SAI indexed. To make non-primary-key columns indexed configure the metadata column with the `SchemaColumnTags.INDEXED`.
167
+
168
+
177
169
178
170
179
171
== Advanced Example: Vector Store ontop full Wikipedia dataset
@@ -187,7 +179,8 @@ Create the schema in the Cassandra database first:
// the deliminator used to join fields together into the document's id
222
-
// is arbitary, here "§¶" is used
214
+
// the deliminator used to join fields together into the document's id is arbitary
215
+
// here "§¶" is used
223
216
if (primaryKeys.isEmpty()) {
224
217
return "test§¶0";
225
218
}
@@ -243,8 +236,11 @@ public EmbeddingClient embeddingClient() {
243
236
}
244
237
----
245
238
239
+
240
+
== Complete wikipedia dataset
241
+
246
242
And, if you would like to load the full wikipedia dataset.
247
-
First download the `simplewiki-sstable.tar` from this link https://drive.google.com/file/d/1CcMMsj8jTKRVGep4A7hmOSvaPepsaKYP/view?usp=share_link . This will take a while, the file is tens of GBs.
243
+
First download the `simplewiki-sstable.tar` from this link https://s.apache.org/simplewiki-sstable-tar . This will take a while, the file is tens of GBs.
Copy file name to clipboardExpand all lines: spring-ai-spring-boot-autoconfigure/src/main/java/org/springframework/ai/autoconfigure/vectorstore/cassandra/CassandraConnectionDetails.java
Copy file name to clipboardExpand all lines: spring-ai-spring-boot-autoconfigure/src/main/java/org/springframework/ai/autoconfigure/vectorstore/cassandra/CassandraVectorStoreAutoConfiguration.java
Copy file name to clipboardExpand all lines: spring-ai-spring-boot-autoconfigure/src/main/java/org/springframework/ai/autoconfigure/vectorstore/cassandra/CassandraVectorStoreProperties.java
0 commit comments