Dev #121

dcloud347 · 2025-09-04T11:10:15Z

This pull request introduces significant improvements to how text nodes are handled in the knowledge graph, focusing on tracking and deduplicating context by line numbers rather than generic metadata. The changes include adding explicit start and end line numbers to text nodes, updating all related code and database interactions, and implementing robust deduplication logic for extracted contexts. These updates improve the accuracy and usefulness of context extraction, storage, and retrieval throughout the system.

Knowledge Graph Enhancements:

Replaced the generic metadata field in TextNode and Neo4jTextNode with explicit start_line and end_line fields, updating all related code, database queries, and serialization/deserialization logic to use these new fields. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12]
During text file graph construction, line positions are now calculated and stored as metadata in each document chunk, allowing for precise mapping of text to original file lines. [1] [2]

Context Extraction and Deduplication:

Added a new deduplicate_contexts utility that removes duplicate or contained contexts based on file, content, and line numbers, and applied it to all context extraction flows. This ensures only unique and most relevant contexts are returned. [1] [2] [3] [4]
Updated context extraction logic to skip empty content and deduplicate before returning results, improving both efficiency and relevance of context data. [1] [2]

Other Improvements:

Improved artifact aggregation in transform_tool_messages_to_str to handle all tool message artifacts collectively, ensuring comprehensive context stringification.
Fixed an off-by-one error in line selection for code reading, ensuring correct lines are included.
Minor: Updated tool initialization to specify a new response format for file reading tools.

These changes collectively make context extraction, storage, and retrieval more accurate, deduplicated, and line-aware, which is crucial for downstream tasks such as code analysis and bug localization.

…utes

…ph builder and traversal

…xt extraction

…function

dcloud347 added 7 commits September 4, 2025 02:05

Fix issue #119: Add Start Line and End Line for TextNode

00d2e3c

Merge remote-tracking branch 'origin/main' into fix-issue-119

e70d23d

feat: Remove metadata from TextNode and add start and end line attrib…

01c3322

…utes

feat: Update TextNode to include start and end line attributes in gra…

3773148

…ph builder and traversal

feat: Add response format to read file tool in context provider node

12e593b

feat: Enhance context extraction by skipping empty and duplicate content

ad11f8d

feat: Implement context deduplication and skip empty content in conte…

9747b01

…xt extraction

dcloud347 linked an issue Sep 4, 2025 that may be closed by this pull request

Add Start Line and End Line for TextNode #119

Closed

fix: Remove unnecessary blank line in transform_tool_messages_to_str …

a6957c0

…function

dcloud347 merged commit f917433 into main Sep 4, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dev #121

Dev #121

Uh oh!

dcloud347 commented Sep 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Dev #121

Dev #121

Uh oh!

Conversation

dcloud347 commented Sep 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant