Skip to content

Conversation

natoverse
Copy link
Collaborator

Removes the group_by_columns config that would group documents before chunking. In practice this is never used, but adds a lot of complexity to maintain.

@natoverse natoverse requested a review from a team as a code owner September 9, 2025 20:53
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR removes the group_by_columns configuration option from text chunking, simplifying the system by eliminating the ability to group documents before chunking. The change moves from a one-to-many or many-to-many relationship between documents and text units to a strict one-to-many relationship where each text unit belongs to exactly one document.

  • Removes group_by_columns parameter from chunking configuration and related workflows
  • Changes text unit data model from document_ids (list) to document_id (single string)
  • Updates test files to reflect new chunking behavior and token counts

Reviewed Changes

Copilot reviewed 23 out of 32 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
graphrag/config/models/chunking_config.py Removes group_by_columns field from ChunkingConfig
graphrag/data_model/text_unit.py Changes document_ids to document_id in TextUnit model
graphrag/index/workflows/create_base_text_units.py Simplifies chunking logic by removing document grouping
graphrag/index/workflows/create_final_*.py Updates workflows to use document_id instead of document_ids
tests/verbs/test_*.py Updates test assertions for new chunking behavior
docs/*.md Updates documentation to reflect removal of grouping feature

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@natoverse natoverse merged commit 97704ab into v3/main Sep 9, 2025
12 checks passed
@natoverse natoverse deleted the remove-text-unit-grouping branch September 9, 2025 23:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants