Skip to content

Conversation

natoverse
Copy link
Collaborator

Removes the unused file filtering functionality from the document loaders

@natoverse natoverse requested a review from a team as a code owner September 9, 2025 01:03
@AlonsoGuevara
Copy link
Collaborator

Seems like this change affects how the file array structure is being returned, causing the integ tests to fail.
["dulce.txt"] != [("dulce.txt", {})]

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR removes the unused file filtering functionality from document loaders by eliminating the file_filter parameter and associated logic from storage classes.

  • Removes file_filter parameter from all storage classes' find methods
  • Updates method signatures to return Iterator[str] instead of Iterator[tuple[str, dict[str, Any]]]
  • Removes associated filtering logic and configuration options

Reviewed Changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
graphrag/storage/pipeline_storage.py Updates abstract base class interface to remove file_filter parameter
graphrag/storage/file_pipeline_storage.py Removes file filtering logic and simplifies return type
graphrag/storage/blob_pipeline_storage.py Removes file filtering functionality from blob storage
graphrag/storage/cosmosdb_pipeline_storage.py Removes file filtering from CosmosDB storage implementation
graphrag/config/models/input_config.py Removes file_filter field from configuration model
graphrag/config/defaults.py Removes file_filter default value
graphrag/index/input/util.py Updates to handle simplified return type from storage.find()
graphrag/index/input/text.py Simplifies load_file function signature
graphrag/index/input/json.py Removes group parameter handling
graphrag/index/input/csv.py Removes group parameter handling
tests/ Updates test files to handle new simplified interface
docs/config/yaml.md Removes file_filter documentation

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@natoverse
Copy link
Collaborator Author

Seems like this change affects how the file array structure is being returned, causing the integ tests to fail. ["dulce.txt"] != [("dulce.txt", {})]

Thanks, I thought I'd caught all of those.

@natoverse natoverse merged commit 978e798 into v3/main Sep 9, 2025
12 checks passed
@natoverse natoverse deleted the remove-file-filtering branch September 9, 2025 22:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants