Skip to content

langchain[minor]: Add pgvector to list of supported vectorstores in self query retriever #22678

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

pprados
Copy link
Contributor

@pprados pprados commented Jun 7, 2024

Thank you for contributing to LangChain!

  • langchain: "package: description"

  • PR message:
    Description:
    The fact that we outsourced pgvector to another project has an unintended effect. The mapping dictionary found by _get_builtin_translator() cannot recognize the new version of pgvector because it comes from another package.
    SelfQueryRetriever no longer knows PGVector.

I propose to fix this by creating a global dictionary that can be populated by various database implementations. Thus, importing langchain_postgres will allow the registration of the PGvector mapping.

But for the moment I'm just adding a lazy import

Furthermore, the implementation of _get_builtin_translator() reconstructs the BUILTIN_TRANSLATORS variable with each invocation, which is not very efficient. A global map would be an optimization.

  • Twitter handle: pprados

@eyurtsev, can you review this PR? And unlock the PR Add async mode for pgvector and PR community[minor]: Add SQL storage implementation?

Are you in favour of a global dictionary-based implementation of Translator?

Copy link

vercel bot commented Jun 7, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview Jun 7, 2024 2:28pm

@pprados pprados marked this pull request as ready for review June 7, 2024 14:40
@dosubot dosubot bot added size:XS This PR changes 0-9 lines, ignoring generated files. Ɑ: retriever Related to retriever module 🔌: postgres Related to postgres integrations 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature labels Jun 7, 2024
@dosubot dosubot bot added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Jun 10, 2024
@@ -168,6 +168,14 @@ def _get_builtin_translator(vectorstore: VectorStore) -> Visitor:
if isinstance(vectorstore, Chroma):
return ChromaTranslator()

try:
from langchain_postgres import PGVector
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(not related to your PR) The try and except here is very odd -- we can just pull the package name from the vectorstore instance itself (via .class)

@eyurtsev eyurtsev changed the title Fix SelfQuery with PGVector langchain[minor]: Add pgvector to list of supported vectorstores in self query retriever Jun 10, 2024
@eyurtsev eyurtsev merged commit 2d4689d into langchain-ai:master Jun 10, 2024
60 checks passed
pprados added a commit to pprados/langchain that referenced this pull request Jun 11, 2024
…elf query retriever (langchain-ai#22678)

The fact that we outsourced pgvector to another project has an
unintended effect. The mapping dictionary found by
`_get_builtin_translator()` cannot recognize the new version of pgvector
because it comes from another package.
`SelfQueryRetriever` no longer knows `PGVector`.

I propose to fix this by creating a global dictionary that can be
populated by various database implementations. Thus, importing
`langchain_postgres` will allow the registration of the `PGvector`
mapping.

But for the moment I'm just adding a lazy import

Furthermore, the implementation of _get_builtin_translator()
reconstructs the BUILTIN_TRANSLATORS variable with each invocation,
which is not very efficient. A global map would be an optimization.

- **Twitter handle:** pprados

@eyurtsev, can you review this PR? And unlock the PR [Add async mode for
pgvector](langchain-ai/langchain-postgres#32)
and PR [community[minor]: Add SQL storage
implementation](langchain-ai#22207)?

Are you in favour of a global dictionary-based implementation of
Translator?
@pprados pprados deleted the pprados/fix_self_query_with_pgvector branch June 18, 2024 06:34
hinthornw pushed a commit that referenced this pull request Jun 20, 2024
…elf query retriever (#22678)

The fact that we outsourced pgvector to another project has an
unintended effect. The mapping dictionary found by
`_get_builtin_translator()` cannot recognize the new version of pgvector
because it comes from another package.
`SelfQueryRetriever` no longer knows `PGVector`.

I propose to fix this by creating a global dictionary that can be
populated by various database implementations. Thus, importing
`langchain_postgres` will allow the registration of the `PGvector`
mapping.

But for the moment I'm just adding a lazy import

Furthermore, the implementation of _get_builtin_translator()
reconstructs the BUILTIN_TRANSLATORS variable with each invocation,
which is not very efficient. A global map would be an optimization.

- **Twitter handle:** pprados

@eyurtsev, can you review this PR? And unlock the PR [Add async mode for
pgvector](langchain-ai/langchain-postgres#32)
and PR [community[minor]: Add SQL storage
implementation](#22207)?

Are you in favour of a global dictionary-based implementation of
Translator?
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature lgtm PR looks good. Use to confirm that a PR is ready for merging. 🔌: postgres Related to postgres integrations Ɑ: retriever Related to retriever module size:XS This PR changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants