-
Notifications
You must be signed in to change notification settings - Fork 3k
Input docs API parameter #2034
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Input docs API parameter #2034
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds an optional input_documents
parameter to the GraphRAG indexing API that allows users to bypass document loading and parsing by supplying a pre-processed DataFrame directly. This addresses user requests to skip the input finding/parsing step when documents are already available in DataFrame format.
- Adds
input_documents
parameter to indexing API functions - Implements logic to write supplied DataFrames directly to storage and skip document loading workflows
- Adds a demonstration notebook showing how to use the new feature
- Removes unused workflow registration function
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
graphrag/index/typing/pipeline.py | Adds remove method to Pipeline class for workflow removal |
graphrag/index/run/run_pipeline.py | Implements input_documents parameter handling and workflow skipping logic |
graphrag/api/index.py | Exposes input_documents parameter in public API and removes unused function |
docs/examples_notebooks/input_documents.ipynb | Provides example notebook demonstrating the new functionality |
.semversioner/next-release/minor-20250826235020448734.json | Documents the change as a minor version update |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
* Add optional input_documents to index API * Semver * Add input dataframe example notebook * Format * Fix docs and notebook
This adds an optional input docs param to the indexing API. Several users have requested the ability to skip input finding/parsing and just supply the dataframe directly, so this provides a mechanism to do so.