Input docs API parameter #2034

natoverse · 2025-08-26T23:52:46Z

This adds an optional input docs param to the indexing API. Several users have requested the ability to skip input finding/parsing and just supply the dataframe directly, so this provides a mechanism to do so.

Copilot

Pull Request Overview

This PR adds an optional input_documents parameter to the GraphRAG indexing API that allows users to bypass document loading and parsing by supplying a pre-processed DataFrame directly. This addresses user requests to skip the input finding/parsing step when documents are already available in DataFrame format.

Adds input_documents parameter to indexing API functions
Implements logic to write supplied DataFrames directly to storage and skip document loading workflows
Adds a demonstration notebook showing how to use the new feature
Removes unused workflow registration function

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
graphrag/index/typing/pipeline.py	Adds `remove` method to Pipeline class for workflow removal
graphrag/index/run/run_pipeline.py	Implements input_documents parameter handling and workflow skipping logic
graphrag/api/index.py	Exposes input_documents parameter in public API and removes unused function
docs/examples_notebooks/input_documents.ipynb	Provides example notebook demonstrating the new functionality
.semversioner/next-release/minor-20250826235020448734.json	Documents the change as a minor version update

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

graphrag/api/index.py

docs/examples_notebooks/input_documents.ipynb

graphrag/api/index.py

* Add optional input_documents to index API * Semver * Add input dataframe example notebook * Format * Fix docs and notebook

natoverse added 2 commits August 26, 2025 16:49

Add optional input_documents to index API

a8f6793

Semver

db1bc71

natoverse requested a review from a team as a code owner August 26, 2025 23:52

natoverse added 3 commits August 26, 2025 17:25

Add input dataframe example notebook

3b8869a

Format

622e52e

Merge branch 'main' into input-docs-param

575ac5f

natoverse mentioned this pull request Aug 27, 2025

Support dataframe as input #2000

Closed

4 tasks

Merge branch 'main' into input-docs-param

67e37a3

AlonsoGuevara requested a review from Copilot September 2, 2025 21:44

Copilot AI reviewed Sep 2, 2025

View reviewed changes

graphrag/api/index.py Outdated Show resolved Hide resolved

docs/examples_notebooks/input_documents.ipynb Outdated Show resolved Hide resolved

AlonsoGuevara reviewed Sep 2, 2025

View reviewed changes

graphrag/api/index.py Show resolved Hide resolved

AlonsoGuevara approved these changes Sep 2, 2025

View reviewed changes

Fix docs and notebook

3d0e70f

natoverse merged commit 1cb20b6 into main Sep 2, 2025
16 checks passed

natoverse deleted the input-docs-param branch September 2, 2025 23:15

opensourcemukul pushed a commit to opensourcemukul/graphrag that referenced this pull request Sep 13, 2025

Input docs API parameter (microsoft#2034)

b087be9

* Add optional input_documents to index API * Semver * Add input dataframe example notebook * Format * Fix docs and notebook

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Input docs API parameter #2034

Input docs API parameter #2034

Uh oh!

natoverse commented Aug 26, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Input docs API parameter #2034

Input docs API parameter #2034

Uh oh!

Conversation

natoverse commented Aug 26, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants