This project creates knowledge graphs from medical PDF reports using the new schema
options available in Neo4j GraphRAG package 1.8.
It provides multiple examples showing different levels of schema control.
For a better introduction to the new schema options, check out Unleashing The Power Of Schema: What's New in neo4j-graphrag python by Nathalie Charbel @ Neo4j
All medical reports were synthetically generated.
Before getting started, ensure you have the following installed and configured:
Install uv
(Python package manager):
git clone https://github.com/neo4j-product-examples/graphrag-package-1.8-examples.git
cd graphrag-package-1.8-examples
uv venv
source .venv/bin/activate
uv sync
Create a .env
file in the root directory with the following variables:
# OpenAI Configuration (Required)
OPENAI_API_KEY=your_openai_api_key_here
# Neo4j Database Configuration
NEO4J_URI=neo4j+s://<your_aura_instance_id>.databases.neo4j.io
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=your_neo4j_password_here
NEO4J_DATABASE=neo4j
Important Notes:
OPENAI_API_KEY
is required for all operations- Neo4j variables have defaults but should be configured for your specific setup
- Make sure your Neo4j database is running and accessible
The project provides four different approaches to create knowledge graphs from medical reports:
This approach uses automatic schema extraction without providing a predefined schema.
python create-kg-no-schema.py
What it does:
- Automatically extracts schema from the medical report content
- Creates a knowledge graph guided by the schema that was previously extracted
- Uses
Report1.pdf
as the input document
This approach uses a predefined schema to structure the knowledge graph.
python create-kg-with-schema.py
What it does:
- Uses a predefined schema with specific node types and relationships
- Includes entities like Patient, Physician, Medical Institution, Symptoms, etc.
- Provides more structured and consistent knowledge graph output
This approach uses an enhanced version of the schema for better knowledge graph quality.
python create-kg-with-improved-schema.py
What it does:
- Uses an improved schema with better-defined relationships and properties
- Particularly, part of the information from Symptom is moved to the relationship type!
This approach generates a schema using LLM analysis and saves it for future use.
python extract-save-schema.py
What it does:
- Analyzes medical reports to automatically generate an appropriate schema
- Saves the generated schema to
medical-report-schema.json
- The schema is loaded from JSON and used to create a knowledge graph
graphrag-package-1.8-examples/
├── create-kg-no-schema.py # No schema approach
├── create-kg-with-schema.py # Predefined schema approach
├── create-kg-with-improved-schema.py # Enhanced schema approach
├── extract-save-schema.py # Schema generation and saving
├── medical-report-schema.json # Generated/saved schema file
├── reports/ # Medical PDF reports directory
│ ├── Report1.pdf
│ ├── report2.pdf
│ └── ...
├── pyproject.toml # Project configuration
├── .env # Environment variables (create this)
└── README.md # This file
The project uses the following main dependencies:
neo4j-graphrag>=1.8.0
- Neo4j GraphRAG libraryneo4j-rust-ext>=5.28.1.0
- Neo4j extensionsopenai>=1.97.0
- OpenAI API clientpython-dotenv>=1.1.1
- Environment variable management
- Python >= 3.13
- OpenAI API access (API key required)
- Neo4j database (local or remote) - Make sure to populate your
.env
file as above - PDF medical reports in the
reports/
directory
- Missing OpenAI API Key: Ensure
OPENAI_API_KEY
is set in your.env
file - Neo4j Connection Error: Verify your Neo4j database is running and connection details are correct
For issues with:
- Neo4j GraphRAG Package: Raise an issue on the official neo4j-graphrag package repo