Skip to content

Conversation

natoverse
Copy link
Collaborator

This adds an optional input docs param to the indexing API. Several users have requested the ability to skip input finding/parsing and just supply the dataframe directly, so this provides a mechanism to do so.

@natoverse natoverse requested a review from a team as a code owner August 26, 2025 23:52
@natoverse natoverse mentioned this pull request Aug 27, 2025
4 tasks
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds an optional input_documents parameter to the GraphRAG indexing API that allows users to bypass document loading and parsing by supplying a pre-processed DataFrame directly. This addresses user requests to skip the input finding/parsing step when documents are already available in DataFrame format.

  • Adds input_documents parameter to indexing API functions
  • Implements logic to write supplied DataFrames directly to storage and skip document loading workflows
  • Adds a demonstration notebook showing how to use the new feature
  • Removes unused workflow registration function

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
graphrag/index/typing/pipeline.py Adds remove method to Pipeline class for workflow removal
graphrag/index/run/run_pipeline.py Implements input_documents parameter handling and workflow skipping logic
graphrag/api/index.py Exposes input_documents parameter in public API and removes unused function
docs/examples_notebooks/input_documents.ipynb Provides example notebook demonstrating the new functionality
.semversioner/next-release/minor-20250826235020448734.json Documents the change as a minor version update

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@natoverse natoverse merged commit 1cb20b6 into main Sep 2, 2025
16 checks passed
@natoverse natoverse deleted the input-docs-param branch September 2, 2025 23:15
opensourcemukul pushed a commit to opensourcemukul/graphrag that referenced this pull request Sep 13, 2025
* Add optional input_documents to index API

* Semver

* Add input dataframe example notebook

* Format

* Fix docs and notebook
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants