Graph RAG for Medicine: A Knowledge Graph Enhanced Retrieval-Augmented Generation Application

This project showcases the development of a Graph RAG (Retrieval-Augmented Generation) application combining Large Language Models (LLMs) with knowledge graphs to enhance the accuracy and explainability of Retrieval-Augmented Generation.

Key Features

Hybrid Search Approach:
- Initial retrieval of articles using a vector database based on semantic similarity.
- Refinement of results via a knowledge graph and a controlled vocabulary (MeSH).
Context Poisoning Mitigation:
- Ensures that LLMs process only the most relevant and structured data, improving reliability.
Streamlined Workflow:
- Built using Streamlit for an intuitive user experience.
- Demonstrates a three-step pipeline:
  - Search articles using vector similarity.
  - Refine terms with the MeSH vocabulary and knowledge graph.
  - Filter and summarize results with an LLM.
Scalability and Governance:
- Highlights the importance of structured metadata for scalable and real-world deployments.

Prerequisites

Setup

To run this project locally, follow the steps below:

Step 1: Clone the Repository

git clone https://github.com/Sa1f27/GraphRAG.git
cd GraphRAG

Step 2: Set Environment Variables

Create a .env file in the root directory and include the following:

WCD_URL=<paste your Weaviate instance>
WCD_API_KEY=<paste your Weaviate API key>
OPENAI_API_KEY=<paste your OPENAI API key>

Step 3: Download the Dataset

Get the PubMed MultiLabel Text Classification Dataset MeSH from Kaggle: Download Here.

Step 4: Generate Knowledge Graph Data

Run the code in the notebook VectorVsKG_updated.ipynb to process the data. This notebook generates the file PubMedGraph.ttl. Place this file in the code folder containing the app.

Step 5: Install Dependencies

Ensure all required Python dependencies are installed:

pip install -r requirements.txt

Step 6: Run the Streamlit App

Start the application with the following command:

streamlit run app.py

Screenshots

Application Workflow

Notes

This application demonstrates the synergy of LLMs, knowledge graphs, and vector databases in solving real-world problems.
Scalability and metadata governance ensure robust and reliable performance in production environments.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
query_functions		query_functions
static		static
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
VectorVsKG_updated.ipynb		VectorVsKG_updated.ipynb
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Graph RAG for Medicine: A Knowledge Graph Enhanced Retrieval-Augmented Generation Application

Key Features

Prerequisites

Setup

Step 1: Clone the Repository

Step 2: Set Environment Variables

Step 3: Download the Dataset

Step 4: Generate Knowledge Graph Data

Step 5: Install Dependencies

Step 6: Run the Streamlit App

Screenshots

Application Workflow

Notes

About

Uh oh!

Releases

Packages

Languages

License

Sa1f27/GraphRAG

Folders and files

Latest commit

History

Repository files navigation

Graph RAG for Medicine: A Knowledge Graph Enhanced Retrieval-Augmented Generation Application

Key Features

Prerequisites

Setup

Step 1: Clone the Repository

Step 2: Set Environment Variables

Step 3: Download the Dataset

Step 4: Generate Knowledge Graph Data

Step 5: Install Dependencies

Step 6: Run the Streamlit App

Screenshots

Application Workflow

Notes

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages