Skip to content

add support for OpenSearch vector store #508

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 50 commits into from

Conversation

JM-Lab
Copy link
Contributor

@JM-Lab JM-Lab commented Mar 25, 2024

Add OpenSearchAiSearchFilterExpressionConverter
Add support for FilterExpression
Add support for OpenSearch vector store

@eddumelendez
Copy link
Contributor

eddumelendez commented Mar 25, 2024

Hi @JM-Lab, AWS also provides OpenSearch and the java sdk provides AwsSdk2Transport. Wonder if should be part of this PR or a separate one.

@JM-Lab
Copy link
Contributor Author

JM-Lab commented Mar 26, 2024

Hi @eddumelendez,

Sure, use AwsSdk2Transport for OpenSearchClient. Check this link for details: https://opensearch.org/docs/latest/clients/java/#connecting-to-amazon-opensearch-service

Copy link
Contributor

@tzolov tzolov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @JM-Lab ,
You get an extra points for implementing the filter expressions as well ;)
I made few comments inline. In addition would you consider:

  • Adding documentation under the vectordbs and add the references to the catalog nav.adoc.
  • Add opensearch auto-configuration and boot starter

<dependency>
<groupId>org.opensearch.client</groupId>
<artifactId>opensearch-java</artifactId>
<version>2.9.1</version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider moving the version as a property in the parent POM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a commit to move the version to a property in the parent POM.

<version>1.0.0-SNAPSHOT</version>
<relativePath>../../pom.xml</relativePath>
</parent>
<artifactId>spring-ai-opensearch-store</artifactId>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the spring-ai-opensearch-store dependency to the spring-bom

</dependency>

<dependency>
<groupId>org.apache.httpcomponents.client5</groupId>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a test dependency only?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}
}

public CreateIndexResponse createIndexMapping(String index, Map<String, Property> properties) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this supposed to be public?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, It's public to handle different sizes for each embedding model and allow for custom metadata mappings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7b77b19

Refer to the commit where I added the mappingJson parameter to OpenSearchVectorStore to handle different embedding sizes and allow custom metadata mappings.

@JM-Lab
Copy link
Contributor Author

JM-Lab commented Apr 4, 2024

Hi @tzolov,

I plan to address your requests within the next 1-2 weeks. Would you prefer me to add commits to the current pull request or handle them in a separate pull request? Let me know what you think.

@JM-Lab JM-Lab requested a review from tzolov April 4, 2024 15:14
@JM-Lab
Copy link
Contributor Author

JM-Lab commented Apr 14, 2024

  • Adding documentation under the vectordbs and add the references to the catalog nav.adoc.
  • Add opensearch auto-configuration and boot starter

markpollack and others added 16 commits April 17, 2024 10:54
- add concurrency to store.add(..) (bc embeddingClient is slow)
- CassandraVectorStoreAutoConfiguration uses CassandraAutoConfiguration
- driver profiles for production stability+performance,
- small cleanups and naming fixes,
- main doc tidy-up
- astradb compatibility (protocol V4)
– don't create embeddings again for documents that already have them
  similar to spring-projects#413
  Facilitates the creation of multimodal message queries.
  aligne logger name with the className
Fixed java code.
@tzolov tzolov added this to the 1.0.0-M2 milestone Apr 26, 2024
iAMSagar44 and others added 22 commits April 26, 2024 19:03
 - Resolve an issue where a new index keeps getting created during application start up.
 - Solution is is to create an index only if an index on the embedding column does not exist.
 - Add missing index name.
…ms), and make index name unique (for when there are multiple vector indexes in the same keyspace)

And change stream to for-loop when converting List<Double> to Float[] for performance
@JM-Lab
Copy link
Contributor Author

JM-Lab commented Apr 28, 2024

Hi @tzolov,

Will reset tangled commits and soon add documentation for vectordbs with updates to catalog nav.adoc for a clean pull request.

@JM-Lab
Copy link
Contributor Author

JM-Lab commented May 1, 2024

@tzolov

I've just created a new PR focusing solely on the OpenSearch files, on the main branch. You can find it here: #663.

Going to close this current one. Thanks!

@JM-Lab JM-Lab closed this May 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.