Remove text unit grouping #2052

natoverse · 2025-09-09T20:53:44Z

Removes the group_by_columns config that would group documents before chunking. In practice this is never used, but adds a lot of complexity to maintain.

Copilot

Pull Request Overview

This PR removes the group_by_columns configuration option from text chunking, simplifying the system by eliminating the ability to group documents before chunking. The change moves from a one-to-many or many-to-many relationship between documents and text units to a strict one-to-many relationship where each text unit belongs to exactly one document.

Removes group_by_columns parameter from chunking configuration and related workflows
Changes text unit data model from document_ids (list) to document_id (single string)
Updates test files to reflect new chunking behavior and token counts

Reviewed Changes

Copilot reviewed 23 out of 32 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
graphrag/config/models/chunking_config.py	Removes `group_by_columns` field from ChunkingConfig
graphrag/data_model/text_unit.py	Changes `document_ids` to `document_id` in TextUnit model
graphrag/index/workflows/create_base_text_units.py	Simplifies chunking logic by removing document grouping
graphrag/index/workflows/create_final_*.py	Updates workflows to use `document_id` instead of `document_ids`
tests/verbs/test_*.py	Updates test assertions for new chunking behavior
docs/*.md	Updates documentation to reflect removal of grouping feature

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

graphrag/index/workflows/create_base_text_units.py

graphrag/query/input/loaders/dfs.py

graphrag/index/workflows/create_base_text_units.py

natoverse added 2 commits September 9, 2025 13:51

Remove text unit group_by_columns

d417909

Semver

82ed674

natoverse requested a review from a team as a code owner September 9, 2025 20:53

natoverse added 3 commits September 9, 2025 14:03

Fix default token split test

f8ea091

Fix models in config test samples

35c92d3

Fix token length in context sort test

7b002e9

AlonsoGuevara requested a review from Copilot September 9, 2025 22:26

Copilot AI reviewed Sep 9, 2025

View reviewed changes

graphrag/index/workflows/create_base_text_units.py Outdated Show resolved Hide resolved

graphrag/index/workflows/create_base_text_units.py Show resolved Hide resolved

graphrag/query/input/loaders/dfs.py Show resolved Hide resolved

AlonsoGuevara reviewed Sep 9, 2025

View reviewed changes

graphrag/index/workflows/create_base_text_units.py Outdated Show resolved Hide resolved

natoverse added 2 commits September 9, 2025 15:42

Merge branch 'v3/main' into remove-text-unit-grouping

42ce7ed

Fix document sort

e67f526

natoverse merged commit 97704ab into v3/main Sep 9, 2025
12 checks passed

natoverse deleted the remove-text-unit-grouping branch September 9, 2025 23:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove text unit grouping #2052

Remove text unit grouping #2052

Uh oh!

natoverse commented Sep 9, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Remove text unit grouping #2052

Remove text unit grouping #2052

Uh oh!

Conversation

natoverse commented Sep 9, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants