Skip to content

Make the embedding field name configurable for the ElasticSearchVectorStore #2175

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

dev-jonghoonpark
Copy link
Contributor

Related issue: #2082

Motivation:

On #2082, We found embedding field name is not configurable.

Modification:

Added a configuration option for the embedding field name in the ElasticSearchVectorStore

Result:

Users can now specify a custom embedding field name through configuration.
Close #2082 issue.

Further discussion:

Is an ElasticSearchDocument record really necessary? If not, what about just using the Map class?

@dev-jonghoonpark
Copy link
Contributor Author

Fixed conflict with #2176

@ilayaperumalg
Copy link
Member

Is an ElasticSearchDocument record really necessary? If not, what about just using the Map class?

@dev-jonghoonpark We wanted to go with a specific type instead of Map. Do you see any downside of using the Record type for this?

@dev-jonghoonpark
Copy link
Contributor Author

@ilayaperumalg
No, I was just using the Map to make the field names configurable, with no other intentions.
In the code I wrote, I had to use conditional branching to configure the field names. This approach feels a bit unsatisfactory, so I wanted to share my thoughts with this comment.

private Object getDocument(Document document, float[] embedding, String fieldName) {
    Assert.notNull(document.getText(), "document's text must not be null");
    if (fieldName.equals("embedding")) {
	    return new ElasticSearchDocument(document.getId(), document.getText(), document.getMetadata(), embedding);
    }
    
    return Map.of("id", document.getId(), "content", document.getText(), "metadata", document.getMetadata(),
		    fieldName, embedding);
}

Could there have been a better way?

@dev-jonghoonpark dev-jonghoonpark force-pushed the GH-2082 branch 2 times, most recently from a567120 to 729ab32 Compare February 12, 2025 15:24
@dev-jonghoonpark
Copy link
Contributor Author

@ilayaperumalg
I've updated it based on your feedback.

@dev-jonghoonpark
Copy link
Contributor Author

Fixed conflict with #2209

@dev-jonghoonpark
Copy link
Contributor Author

Hi @ilayaperumalg, I wanted to give a quick reminder about this PR in case it was forgotten.
I've resolved the conflicts.
Please take a look when you have time.
Thanks!

document.getMetadata(), embeddings.get(documents.indexOf(document)));
bulkRequestBuilder.operations(
op -> op.index(idx -> idx.index(this.options.getIndexName()).id(document.getId()).document(doc)));
for (int i = 0; i < embeddings.size(); i++) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this change, you can delete ElasticSearchDocument now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I deleted the ElasticSearchDocument record. Thanks for your feedback!

…rStore

Signed-off-by: jonghoon park <dev@jonghoonpark.com>
@ilayaperumalg
Copy link
Member

@dev-jonghoonpark Thank you for the PR making the embedding field name configurable in ElasticSearch vector store. Rebased and merged as 4f4da30

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ElasticsearchVectorStore
2 participants