Skip to content

Refactor PgVercorStore filter template to use JSONB field access #589

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

dperezcabrera
Copy link
Contributor

@dperezcabrera dperezcabrera commented Apr 15, 2024

Changed the filter template in PgVercorStore to replace JSONPath expressions with JSONB field access, using standard SQL operators such as '=' instead of '==', 'AND' instead of '&&', and 'OR' instead of '||'. This adjustment addresses compatibility issues with the 'IN' operator, which previously returned parse errors. Also updated PgVectorFilterExpressionConverter and corresponding tests to align with these changes.

  • Replaced metadata::jsonb @@ '$.key == "value"'::jsonpath with metadata::jsonb->>'key' = 'value'
  • Fixed parsing issues with metadata::jsonb @@ '$.key in ["value"]'::jsonpath by using JSONB field access.
  • Updated logical operators IN filter expressions to standard SQL syntax.

These changes improve the clarity, execution reliability, and compatibility of filter expressions in the database.

@tzolov tzolov added enhancement New feature or request vector store labels Apr 20, 2024
@tzolov tzolov added this to the 1.0.0-M1 milestone Apr 20, 2024
@tzolov
Copy link
Contributor

tzolov commented Apr 20, 2024

Thank you @dperezcabrera, looks like a nice improvement!

Could you please pull rebase to the upstream main (e.g. git pull -r upstream main), resolve any eventual conflicts and (force) merge to you PR?

@tzolov tzolov self-assigned this Apr 20, 2024
Changed the filter template in PgVercorStore to replace JSONPath expressions with JSONB field access, using standard SQL operators such as '=' instead of '==', 'AND' instead of '&&', and 'OR' instead of '||'. This adjustment addresses compatibility issues with the 'IN' operator, which previously returned parse errors. Also updated PgVectorFilterExpressionConverter and corresponding tests to align with these changes.

- Replaced `metadata::jsonb @@ '$.key == "value"'::jsonpath` with `metadata::jsonb->>'$.key' = 'value'`
- Fixed parsing issues with `metadata::jsonb @@ '$.key in ["value"]'::jsonpath` by using JSONB field access.
- Updated logical operators IN filter expressions to standard SQL syntax.

These changes improve the clarity, execution reliability, and compatibility of filter expressions in the database.
@dperezcabrera
Copy link
Contributor Author

dperezcabrera commented Apr 21, 2024

Thank you. I've rebased and merged the changes as requested!

@tzolov
Copy link
Contributor

tzolov commented Apr 25, 2024

@dperezcabrera have you run the integration tests for the pgvector and for the autoconfiguraiton?
the ./mvnw clean install -pl vector-stores/spring-ai-pgvector-store -Pintegration-tests is failing form me

@dperezcabrera
Copy link
Contributor Author

dperezcabrera commented Apr 25, 2024

I have executed them; however, since I don't have an API key for OpenAI, this has been the result:
run: 4, Failures: 0, Errors: 0, Skipped: 4.

Can you please send me the error logs by email?

@tzolov
Copy link
Contributor

tzolov commented Apr 28, 2024

@dperezcabrera for example

  • searchWithFilters: COSINE_DISTANCE
org.springframework.jdbc.BadSqlGrammarException: PreparedStatementCallback; bad SQL grammar [SELECT *, embedding [=] ? AS distance FROM vector_store WHERE embedding [=] ? [ ?  AND (metadata::jsonb-]]'country' = 'BG' AND metadata::jsonb-]]'year' = 2020)  ORDER BY distance LIMIT ? ]
  • searchWithFilters: EUCLIDEAN_DISTANCE
org.springframework.jdbc.BadSqlGrammarException: PreparedStatementCallback; bad SQL grammar [SELECT *, embedding [-] ? AS distance FROM vector_store WHERE embedding [-] ? [ ?  AND (metadata::jsonb-]]'country' = 'BG' AND metadata::jsonb-]]'year' = 2020)  ORDER BY distance LIMIT ? ]
  • searchWithFilters: NEGATIVE_INNER_PRODUCT
org.springframework.jdbc.BadSqlGrammarException: PreparedStatementCallback; bad SQL grammar [SELECT *, (1 + (embedding [#] ?)) AS distance FROM vector_store WHERE (1 + (embedding [#] ?)) [ ?  AND (metadata::jsonb-]]'country' = 'BG' AND metadata::jsonb-]]'year' = 2020)  ORDER BY distance LIMIT ? ]

@dperezcabrera
Copy link
Contributor Author

I see another problem, metadata::jsonb->>'year' must be (metadata->>'year')::INTEGER
In protected void doKey(Key key, StringBuilder context) { I can't do this change.

@tzolov tzolov modified the milestones: 1.0.0-M1, 1.0.0-M2 May 28, 2024
@markpollack
Copy link
Member

Hi, this is a great contribution to improve the speed of pgvector, you mention In protected void doKey(Key key, StringBuilder context) { I can't do this change. what do you mean?

I can't easily share access keys, I would recommend getting an openai subscription, the cost is minimal.

@dperezcabrera
Copy link
Contributor Author

dperezcabrera commented Jun 20, 2024

Here: (https://github.com/dperezcabrera/spring-ai/blob/bebbfceab6e22a6bdeda3c3e168aceaf52994abd/spring-ai-core/src/main/java/org/springframework/ai/vectorstore/filter/converter/PgVectorFilterExpressionConverter.java#L64C1-L76C1)

I need to know the type of the key. If fieldname is an integer, the method must use this template (metadata->>'fieldname')::INTEGER. If fieldname is a string, then: metadata::jsonb->>'fieldname'. If fieldname is a date, (metadata->>'fieldname')::DATE. This ensures the key is correctly interpreted in the database query.
This will require a significant change in the common classes and may have side effects on other implementations, which need to be carefully considered before proposing it.

Now I have and access key, before I only had API keys for Microsoft's Azure, and they didn't work with the integrated tests.

@markpollack markpollack modified the milestones: 1.0.0-M2, 1.0.0-RC1 Aug 20, 2024
@markpollack
Copy link
Member

bumping to RC1, apologies

@dperezcabrera
Copy link
Contributor Author

dperezcabrera commented Aug 21, 2024

I think it is best to close this pull request, and address this functionality in the future.
If you agree, close it.

@tzolov
Copy link
Contributor

tzolov commented Sep 24, 2024

Thank you for the effort @dperezcabrera. We can re-open it in the future if you have time to work on it.

Causing due to some unsolved issues (check the log above).

@tzolov tzolov closed this Sep 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request vector store
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants