-
Notifications
You must be signed in to change notification settings - Fork 3k
Remove file filtering #2050
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove file filtering #2050
Conversation
Seems like this change affects how the file array structure is being returned, causing the integ tests to fail. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR removes the unused file filtering functionality from document loaders by eliminating the file_filter
parameter and associated logic from storage classes.
- Removes
file_filter
parameter from all storage classes'find
methods - Updates method signatures to return
Iterator[str]
instead ofIterator[tuple[str, dict[str, Any]]]
- Removes associated filtering logic and configuration options
Reviewed Changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.
Show a summary per file
File | Description |
---|---|
graphrag/storage/pipeline_storage.py | Updates abstract base class interface to remove file_filter parameter |
graphrag/storage/file_pipeline_storage.py | Removes file filtering logic and simplifies return type |
graphrag/storage/blob_pipeline_storage.py | Removes file filtering functionality from blob storage |
graphrag/storage/cosmosdb_pipeline_storage.py | Removes file filtering from CosmosDB storage implementation |
graphrag/config/models/input_config.py | Removes file_filter field from configuration model |
graphrag/config/defaults.py | Removes file_filter default value |
graphrag/index/input/util.py | Updates to handle simplified return type from storage.find() |
graphrag/index/input/text.py | Simplifies load_file function signature |
graphrag/index/input/json.py | Removes group parameter handling |
graphrag/index/input/csv.py | Removes group parameter handling |
tests/ | Updates test files to handle new simplified interface |
docs/config/yaml.md | Removes file_filter documentation |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Thanks, I thought I'd caught all of those. |
Removes the unused file filtering functionality from the document loaders