Skip to content

Conversation

dcloud347
Copy link
Contributor

This pull request introduces significant improvements to how text nodes are handled in the knowledge graph, focusing on tracking and deduplicating context by line numbers rather than generic metadata. The changes include adding explicit start and end line numbers to text nodes, updating all related code and database interactions, and implementing robust deduplication logic for extracted contexts. These updates improve the accuracy and usefulness of context extraction, storage, and retrieval throughout the system.

Knowledge Graph Enhancements:

  • Replaced the generic metadata field in TextNode and Neo4jTextNode with explicit start_line and end_line fields, updating all related code, database queries, and serialization/deserialization logic to use these new fields. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12]

  • During text file graph construction, line positions are now calculated and stored as metadata in each document chunk, allowing for precise mapping of text to original file lines. [1] [2]

Context Extraction and Deduplication:

  • Added a new deduplicate_contexts utility that removes duplicate or contained contexts based on file, content, and line numbers, and applied it to all context extraction flows. This ensures only unique and most relevant contexts are returned. [1] [2] [3] [4]

  • Updated context extraction logic to skip empty content and deduplicate before returning results, improving both efficiency and relevance of context data. [1] [2]

Other Improvements:

  • Improved artifact aggregation in transform_tool_messages_to_str to handle all tool message artifacts collectively, ensuring comprehensive context stringification.

  • Fixed an off-by-one error in line selection for code reading, ensuring correct lines are included.

  • Minor: Updated tool initialization to specify a new response format for file reading tools.

These changes collectively make context extraction, storage, and retrieval more accurate, deduplicated, and line-aware, which is crucial for downstream tasks such as code analysis and bug localization.

@dcloud347 dcloud347 linked an issue Sep 4, 2025 that may be closed by this pull request
@dcloud347 dcloud347 merged commit f917433 into main Sep 4, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Start Line and End Line for TextNode

1 participant