Skip to content

Sa1f27/GraphRAG

Repository files navigation

Graph RAG for Medicine: A Knowledge Graph Enhanced Retrieval-Augmented Generation Application

This project showcases the development of a Graph RAG (Retrieval-Augmented Generation) application combining Large Language Models (LLMs) with knowledge graphs to enhance the accuracy and explainability of Retrieval-Augmented Generation.

Key Features

  1. Hybrid Search Approach:

    • Initial retrieval of articles using a vector database based on semantic similarity.
    • Refinement of results via a knowledge graph and a controlled vocabulary (MeSH).
  2. Context Poisoning Mitigation:

    • Ensures that LLMs process only the most relevant and structured data, improving reliability.
  3. Streamlined Workflow:

    • Built using Streamlit for an intuitive user experience.
    • Demonstrates a three-step pipeline:
      • Search articles using vector similarity.
      • Refine terms with the MeSH vocabulary and knowledge graph.
      • Filter and summarize results with an LLM.
  4. Scalability and Governance:

    • Highlights the importance of structured metadata for scalable and real-world deployments.

Prerequisites

Setup

To run this project locally, follow the steps below:

Step 1: Clone the Repository

git clone https://github.com/Sa1f27/GraphRAG.git
cd GraphRAG

Step 2: Set Environment Variables

Create a .env file in the root directory and include the following:

WCD_URL=<paste your Weaviate instance>
WCD_API_KEY=<paste your Weaviate API key>
OPENAI_API_KEY=<paste your OPENAI API key>

Step 3: Download the Dataset

Get the PubMed MultiLabel Text Classification Dataset MeSH from Kaggle: Download Here.

Step 4: Generate Knowledge Graph Data

Run the code in the notebook VectorVsKG_updated.ipynb to process the data. This notebook generates the file PubMedGraph.ttl. Place this file in the code folder containing the app.

Step 5: Install Dependencies

Ensure all required Python dependencies are installed:

pip install -r requirements.txt

Step 6: Run the Streamlit App

Start the application with the following command:

streamlit run app.py

Screenshots

Application Workflow

Step 1

Step 2

Step 3

Step 4

Step 5

Step 6

Step 7

Step 8

Step 9

Step 10


Notes

  • This application demonstrates the synergy of LLMs, knowledge graphs, and vector databases in solving real-world problems.
  • Scalability and metadata governance ensure robust and reliable performance in production environments.

About

Knowledge Graph Enhanced Retrieval-Augmented Generation Application

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published