a few tweaks, fix test & a couple bugs #29

jkwatson · 2024-11-21T17:34:52Z

No description provided.

jkwatson · 2024-11-21T17:35:26Z

llm-service/app/services/CaiiModel.py

@@ -60,7 +60,8 @@ def __init__(
            api_base=api_base,
            messages_to_prompt=messages_to_prompt,
            completion_to_prompt=completion_to_prompt,
-            default_headers=default_headers)
+            default_headers=default_headers,
+            context=context)


this fix is actually on main, I think

conradocloudera · 2024-11-21T17:44:26Z

llm-service/app/ai/indexing/index.py

@@ -101,6 +95,7 @@ def index_file(self, file_path: str, file_id: str):

        for chunk, embedding in zip(chunks, embeddings):
            chunk.embedding = embedding
+            chunk.metadata["file_name"] = os.path.basename(file_path)


I'm not sure this is true unless we save with the original file name to s3. I thought we saved with the UUID

conradocloudera · 2024-11-21T17:44:36Z

llm-service/app/ai/indexing/index.py

@@ -113,6 +108,7 @@ def _documents_in_file(self, reader: BaseReader, file_path: str, file_id: str) -

        for i, document in enumerate(documents):
            # Update the document metadata
+            document.id_ = file_id


No need to couple them, right? Is there a reason why we need them to match?

conradocloudera · 2024-11-21T17:45:21Z

llm-service/app/ai/indexing/index.py

@@ -124,6 +120,7 @@ def _chunks_in_document(self, document: Document) -> List[BaseNode]:

        for j, chunk in enumerate(chunks):
            chunk.metadata["file_id"] = document.metadata["file_id"]
+            chunk.metadata["document_id"] = document.metadata["file_id"]


Not sure why? Maybe document_id on both sides?

* Provide Indexer to index files * fix imports for local dev * skip tests that require same qdrant client * pass document id within the test * fix the other test with the race condition * fix monkey patch * a few tweaks, fix test & a couple bugs (#29) * resolve mypy * docx support * make ruff happy --------- Co-authored-by: John Watson <jkwatson@gmail.com>

a few tweaks, fix test & a couple bugs

9917e12

jkwatson commented Nov 21, 2024

View reviewed changes

conradocloudera reviewed Nov 21, 2024

View reviewed changes

jkwatson merged commit 743d5f4 into cm/indexer Nov 21, 2024
1 check passed

jkwatson deleted the jw/cm/indexer_patch branch November 21, 2024 17:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

a few tweaks, fix test & a couple bugs #29

a few tweaks, fix test & a couple bugs #29

Uh oh!

jkwatson commented Nov 21, 2024

Uh oh!

jkwatson Nov 21, 2024

Uh oh!

conradocloudera Nov 21, 2024

Uh oh!

conradocloudera Nov 21, 2024

Uh oh!

conradocloudera Nov 21, 2024

Uh oh!

Uh oh!

Uh oh!

a few tweaks, fix test & a couple bugs #29

a few tweaks, fix test & a couple bugs #29

Uh oh!

Conversation

jkwatson commented Nov 21, 2024

Uh oh!

jkwatson Nov 21, 2024

Choose a reason for hiding this comment

Uh oh!

conradocloudera Nov 21, 2024

Choose a reason for hiding this comment

Uh oh!

conradocloudera Nov 21, 2024

Choose a reason for hiding this comment

Uh oh!

conradocloudera Nov 21, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!