Replies: 1 comment
-
Hey @xfen-g! I'm here to help you with any bugs, questions, or contributions. Let's work together to solve the issue you're facing. The LangChain code deletes documents from the vector database before deleting them from the record manager to ensure that the documents are removed from the vector store first. This approach helps avoid potential issues where documents might still be present in the vector store after being removed from the record manager. Here are the relevant code snippets: if cleanup == "incremental":
# Get the uids of the documents that were not returned by the loader.
uids_to_delete = record_manager.list_keys(
group_ids=_source_ids, before=index_start_dt
)
if uids_to_delete:
# Then delete from vector store.
destination.delete(uids_to_delete)
# First delete from record store.
record_manager.delete_keys(uids_to_delete)
num_deleted += len(uids_to_delete) if cleanup == "full":
while uids_to_delete := record_manager.list_keys(
before=index_start_dt, limit=cleanup_batch_size
):
# First delete from record store.
destination.delete(uids_to_delete)
# Then delete from record manager.
record_manager.delete_keys(uids_to_delete)
num_deleted += len(uids_to_delete) Despite the comments suggesting the opposite order, the actual implementation ensures that the documents are deleted from the vector store first to avoid inconsistencies during subsequent operations [1]. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Checked other resources
Commit to Help
Example Code
Description
I'm trying to understand the deletion order for langchain record manager. See code below
https://github.com/langchain-ai/langchain/blob/master/libs/core/langchain_core/indexing/api.py#L404
Why does deletion of docs from vector DB comes first instead of deletion of docs in Record Manager? (Even in comments it mentions to delete from Record Manger first).
This might cause problem when system succeeds to delete some docs in VDB but fails to delete in record manager (e.g. due to sql connection). Next time the ingestion happens, system check hashed docs are already in record manager then will skip those docs which are actually not in VDB.
System Info
N/A
Beta Was this translation helpful? Give feedback.
All reactions