Skip to content

neo4j-product-examples/graphrag-package-1.8-examples

Repository files navigation

GraphRAG Package 1.8 - New Schema Options using a Medical Report Dataset

This project creates knowledge graphs from medical PDF reports using the new schema options available in Neo4j GraphRAG package 1.8.

It provides multiple examples showing different levels of schema control.

For a better introduction to the new schema options, check out Unleashing The Power Of Schema: What's New in neo4j-graphrag python by Nathalie Charbel @ Neo4j

About the Data

All medical reports were synthetically generated.

Prerequisites

Before getting started, ensure you have the following installed and configured:

1. Install uv

Install uv (Python package manager):

2. Clone the Repository

git clone https://github.com/neo4j-product-examples/graphrag-package-1.8-examples.git
cd graphrag-package-1.8-examples

3. Create Virtual Environment with uv

uv venv
source .venv/bin/activate

4. Install Dependencies with uv sync

uv sync

5. Set Up Environment Variables

Create a .env file in the root directory with the following variables:

# OpenAI Configuration (Required)
OPENAI_API_KEY=your_openai_api_key_here

# Neo4j Database Configuration
NEO4J_URI=neo4j+s://<your_aura_instance_id>.databases.neo4j.io
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=your_neo4j_password_here
NEO4J_DATABASE=neo4j

Important Notes:

  • OPENAI_API_KEY is required for all operations
  • Neo4j variables have defaults but should be configured for your specific setup
  • Make sure your Neo4j database is running and accessible

Usage

The project provides four different approaches to create knowledge graphs from medical reports:

1. Create KG with No Schema Provided

This approach uses automatic schema extraction without providing a predefined schema.

python create-kg-no-schema.py

What it does:

  • Automatically extracts schema from the medical report content
  • Creates a knowledge graph guided by the schema that was previously extracted
  • Uses Report1.pdf as the input document

2. Create KG with Schema

This approach uses a predefined schema to structure the knowledge graph.

python create-kg-with-schema.py

What it does:

  • Uses a predefined schema with specific node types and relationships
  • Includes entities like Patient, Physician, Medical Institution, Symptoms, etc.
  • Provides more structured and consistent knowledge graph output

3. Create KG with Improved Schema

This approach uses an enhanced version of the schema for better knowledge graph quality.

python create-kg-with-improved-schema.py

What it does:

  • Uses an improved schema with better-defined relationships and properties
  • Particularly, part of the information from Symptom is moved to the relationship type!

4. Use LLM to Generate & Save Schema

This approach generates a schema using LLM analysis and saves it for future use.

python extract-save-schema.py

What it does:

  • Analyzes medical reports to automatically generate an appropriate schema
  • Saves the generated schema to medical-report-schema.json
  • The schema is loaded from JSON and used to create a knowledge graph

Project Structure

graphrag-package-1.8-examples/
├── create-kg-no-schema.py          # No schema approach
├── create-kg-with-schema.py        # Predefined schema approach
├── create-kg-with-improved-schema.py  # Enhanced schema approach
├── extract-save-schema.py          # Schema generation and saving
├── medical-report-schema.json      # Generated/saved schema file
├── reports/                        # Medical PDF reports directory
│   ├── Report1.pdf
│   ├── report2.pdf
│   └── ...
├── pyproject.toml                  # Project configuration
├── .env                           # Environment variables (create this)
└── README.md                      # This file

Dependencies

The project uses the following main dependencies:

  • neo4j-graphrag>=1.8.0 - Neo4j GraphRAG library
  • neo4j-rust-ext>=5.28.1.0 - Neo4j extensions
  • openai>=1.97.0 - OpenAI API client
  • python-dotenv>=1.1.1 - Environment variable management

Requirements

  • Python >= 3.13
  • OpenAI API access (API key required)
  • Neo4j database (local or remote) - Make sure to populate your .env file as above
  • PDF medical reports in the reports/ directory

Troubleshooting

Common Issues

  1. Missing OpenAI API Key: Ensure OPENAI_API_KEY is set in your .env file
  2. Neo4j Connection Error: Verify your Neo4j database is running and connection details are correct

Support

For issues with:

About

Knoweldge Graph Construction from Medical reports featuring New Schema options released in GraphRAG package 1.8

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages