Skip to content

User Guide: KG Builder - custom Data Loader inaccurate #431

@martinohanlon

Description

@martinohanlon

The documentation for implementing a custom data loader is out of date:

https://neo4j.com/docs/neo4j-graphrag-python/current/user_guide_kg_builder.html#data-loader

The documentation specifies the following:

from pathlib import Path
from neo4j_graphrag.experimental.components.pdf_loader import DataLoader, PdfDocument

class MyDataLoader(DataLoader):
    async def run(self, path: Path) -> PdfDocument:
        # process file in `path`
        return PdfDocument(text="text")

When using as part of the SimpleKGPipeline, the interface expects filepath and DocumentInfo e.g.

from pathlib import Path
from neo4j_graphrag.experimental.components.pdf_loader import DataLoader, PdfDocument, DocumentInfo

class MyDataLoader(DataLoader):
    async def run(self, filepath: Path) -> PdfDocument:
        # process file in `filepath`
        return PdfDocument(
            text="text",
            document_info=DocumentInfo(
                path=str(filepath),
                metadata={}
            )
        )

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions