-
Notifications
You must be signed in to change notification settings - Fork 16
Allow filters to be defined when calling VS tool #104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
13 commits
Select commit
Hold shift + click to select a range
dbdd053
draft
annzhang-db b20afae
update
annzhang-db 99c5cd7
update
annzhang-db 62bc934
update
annzhang-db 194d963
update description
annzhang-db e079e76
unit tests
annzhang-db c7295e8
openai + llamaindex
annzhang-db 7e24711
fix
annzhang-db e45ab3f
llamaindex tests
annzhang-db 1f20b2e
update
annzhang-db 02671d2
update
annzhang-db 896e03c
update
annzhang-db 12f5754
update
annzhang-db File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -37,6 +37,7 @@ dev = [ | |
"hatch", | ||
"pytest", | ||
"ruff==0.6.4", | ||
"databricks-vectorsearch>=0.50", | ||
] | ||
|
||
[tool.ruff] | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -41,6 +41,18 @@ class VectorSearchRetrieverToolInput(BaseModel): | |
description="The string used to query the index with and identify the most similar " | ||
"vectors and return the associated documents." | ||
) | ||
filters: Dict[str, Any] = Field( | ||
default=None, | ||
description=( | ||
"Optional filters to refine vector search results. Supports the following operators:\n\n" | ||
'- Inclusion: {"column": value} or {"column": [value1, value2]} (matches if the column equals any of the provided values)\n' | ||
'- Exclusion: {"column NOT": value}\n' | ||
'- Comparisons: {"column <": value}, {"column >=": value}, etc.\n' | ||
'- Pattern match: {"column LIKE": "word"} (matches full tokens separated by whitespace)\n' | ||
'- OR logic: {"column1 OR column2": [value1, value2]} ' | ||
"(matches if column1 equals value1 or column2 equals value2; matches are position-specific)" | ||
), | ||
) | ||
|
||
|
||
class VectorSearchRetrieverToolMixin(BaseModel): | ||
|
@@ -87,14 +99,47 @@ def validate_tool_name(cls, tool_name): | |
raise ValueError("tool_name must match the pattern '^[a-zA-Z0-9_-]{1,64}$'") | ||
return tool_name | ||
|
||
def _describe_columns(self) -> str: | ||
try: | ||
from databricks.sdk import WorkspaceClient | ||
|
||
if self.workspace_client: | ||
table_info = self.workspace_client.tables.get(full_name=self.index_name) | ||
else: | ||
table_info = WorkspaceClient().tables.get(full_name=self.index_name) | ||
|
||
columns = [] | ||
|
||
for column_info in table_info.columns: | ||
name = column_info.name | ||
comment = column_info.comment or "No description provided" | ||
col_type = column_info.type_name.name | ||
if not name.startswith("__"): | ||
columns.append((name, col_type, comment)) | ||
|
||
return "The vector search index includes the following columns:\n" + "\n".join( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is pretty cool |
||
f"{name} ({col_type}): {comment}" for name, col_type, comment in columns | ||
) | ||
except Exception: | ||
_logger.warning( | ||
"Unable to retrieve column information automatically. Please manually specify column names, types, and descriptions in the tool description to help LLMs apply filters correctly." | ||
) | ||
|
||
def _get_default_tool_description(self, index_details: IndexDetails) -> str: | ||
if index_details.is_delta_sync_index(): | ||
source_table = index_details.index_spec.get("source_table", "") | ||
return ( | ||
description = ( | ||
DEFAULT_TOOL_DESCRIPTION | ||
+ f" The queried index uses the source table {source_table}" | ||
+ f" The queried index uses the source table {source_table}." | ||
) | ||
return DEFAULT_TOOL_DESCRIPTION | ||
else: | ||
description = DEFAULT_TOOL_DESCRIPTION | ||
|
||
column_description = self._describe_columns() | ||
if column_description: | ||
return f"{description}\n\n{column_description}" | ||
else: | ||
return description | ||
|
||
def _get_resources( | ||
self, index_name: str, embedding_endpoint: str, index_details: IndexDetails | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit can also include this link if you think it will be useful: https://docs.databricks.com/aws/en/generative-ai/create-query-vector-search?language=Python%C2%A0SDK#use-filters-on-queries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm didnt realize llm is getting passed this. So a link wont be useful