-
Notifications
You must be signed in to change notification settings - Fork 6
a few tweaks, fix test & a couple bugs #29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -60,7 +60,8 @@ def __init__( | |||
api_base=api_base, | |||
messages_to_prompt=messages_to_prompt, | |||
completion_to_prompt=completion_to_prompt, | |||
default_headers=default_headers) | |||
default_headers=default_headers, | |||
context=context) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this fix is actually on main
, I think
@@ -101,6 +95,7 @@ def index_file(self, file_path: str, file_id: str): | |||
|
|||
for chunk, embedding in zip(chunks, embeddings): | |||
chunk.embedding = embedding | |||
chunk.metadata["file_name"] = os.path.basename(file_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is true unless we save with the original file name to s3. I thought we saved with the UUID
@@ -113,6 +108,7 @@ def _documents_in_file(self, reader: BaseReader, file_path: str, file_id: str) - | |||
|
|||
for i, document in enumerate(documents): | |||
# Update the document metadata | |||
document.id_ = file_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to couple them, right? Is there a reason why we need them to match?
@@ -124,6 +120,7 @@ def _chunks_in_document(self, document: Document) -> List[BaseNode]: | |||
|
|||
for j, chunk in enumerate(chunks): | |||
chunk.metadata["file_id"] = document.metadata["file_id"] | |||
chunk.metadata["document_id"] = document.metadata["file_id"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why? Maybe document_id
on both sides?
* Provide Indexer to index files * fix imports for local dev * skip tests that require same qdrant client * pass document id within the test * fix the other test with the race condition * fix monkey patch * a few tweaks, fix test & a couple bugs (#29) * resolve mypy * docx support * make ruff happy --------- Co-authored-by: John Watson <jkwatson@gmail.com>
No description provided.