-
Notifications
You must be signed in to change notification settings - Fork 4.6k
[fix] Implement weaviate knowledge upsert. Improve AgentKnowledge class. #3784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix] Implement weaviate knowledge upsert. Improve AgentKnowledge class. #3784
Conversation
… hash, this becomes difficult.
…ing if the name field is filled.
…mitations that are build in, namely using the content as UUID hash for storing the document and not having a document wide indicator (full document hash) that indicates if a document changed somewhere.
|
Hey @Siete-F Thanks for your contributions! I think the changes you did are all quite valid. Except for this This restricts users to having very specific rulesets when they setup their knowledge. It could also be that more than one vectorDB is used for a knowledgebase. One of which might support upsert and the other not. I think instead of raising the error, we should keep the default behaviour where it falls back to Would you please mind adding a cookbook implementation for weaviate that uses |
|
Hi @willemcdejongh, thanks for your feedback! Thanks again! will remove the part you discussed. |
Hey @Siete-F i think a logger info makes more sense with a message like- “upsert functionality not available, falling back to insert.” Something like that. |
…com/Siete-F/agno into implement-weaviate-knowledge-upsert
|
@willemcdejongh, I changed it to an info log. Regarding the cookbook example: Added it. For upsert only the logs and not having an error make sense, so enabled debug logging and that shows the following. |
|
When running (agno) C:\Users\scfro\Documents\code\agno\libs\agno\scripts>validate.bat
##################################################
# Validating agno
##################################################
##################################################
# Running: ruff check C:\Users\scfro\Documents\code\agno\libs\agno\scripts\\..
##################################################
All checks passed!
##################################################
# Running: mypy C:\Users\scfro\Documents\code\agno\libs\agno\scripts\\.. --config-file C:\Users\scfro\Documents\code\agno\libs\agno\scripts\\..\pyproject.toml
##################################################
C:\Users\scfro\Documents\code\agno\libs\agno\build\lib\agno\__init__.py: error: Duplicate module named "agno" (also at "C:\Users\scfro\Documents\code\agno\libs\agno\agno\__init__.py")
C:\Users\scfro\Documents\code\agno\libs\agno\build\lib\agno\__init__.py: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#mapping-file-paths-to-modules for more info
C:\Users\scfro\Documents\code\agno\libs\agno\build\lib\agno\__init__.py: note: Common resolutions include: a) using `--exclude` to avoid checking one of them, b) adding `__init__.py` somewhere, c) using `--explicit-package-bases` or adjusting MYPYPATH
Found 1 error in 1 file (errors prevented further checking)
[ERROR] mypy validation failed with exit code 2Is that an error in setup on my side? Or shall I create a ticket for the bug? |
…unction. Rewrote it so that mypy check succeeds.
kausmeows
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this! 🚀
## Summary Fixes #4258 This is a regression from #3784 - the for loop was removed but the `filters` argument has to be specified per doc, not globally. ## Type of change - [x] Bug fix - [ ] New feature - [ ] Breaking change - [ ] Improvement - [ ] Model update - [ ] Other: --- ## Checklist - [x] Code complies with style guidelines - [x] Ran format/validation scripts (`./scripts/format.sh` and `./scripts/validate.sh`) - [x] Self-review completed - [ ] Documentation updated (comments, docstrings) - [ ] Examples and guides: Relevant cookbook examples have been included or updated (if applicable) - [x] Tested in clean environment - [ ] Tests added/updated (if applicable) --- ## Additional Notes I guess a more efficient and thorough solution would be to change the `insert()` and `upsert()` methods to receive a list of filters, but it would also be a breaking change. Co-authored-by: Kaustubh <shuklakaustubh84@gmail.com>
Sidenotes first:
add_document_to_knowledge_baseis only used once and named inconsistently, therefore I renamed it toload_document(there is also anasync_load_document, callingasync_load_documentswith a list with one document).upsertshould receive a list of Documents (2 out of all the calls to upsert are in a loop, passing one doc at a time, and the other calls are with the list of documents... There is no reason for that distinction). Leaving the looping to the upsert function allows for some additional flexibility inside the upsert, like obtaining the complete document content for example or checking the total number of chunks.num_documentswas not used, the final logging should have likely been unindented, printing it's value.Summary
Upsert was practically not implemented, so kind of a feature.
As I tried to state in the issue #3700, it is technically NOT possible to implement a proper "per-chunk" upsert. with the following assumptions:
updateoperations cannot be used since the document ID is actually the content hash, so processing a "change" can only be done by a delete and insert.Because of this I resorted to the following approach:
I am open to feedback and am willing to have a discussion on a better implementation.
As stated in the issue, for a better implementation I would need:
The benefit is that on the first chunk you can relyably say with the full-document hash (any of all the chunks) changed. Then delete all 'documents' from the vector db with that name and insert.
Beside that, I would recomment just using a random UUID4 as document id.
Type of change
Checklist
./scripts/format.shand./scripts/validate.sh)