This project showcases the development of a Graph RAG (Retrieval-Augmented Generation) application combining Large Language Models (LLMs) with knowledge graphs to enhance the accuracy and explainability of Retrieval-Augmented Generation.
-
Hybrid Search Approach:
- Initial retrieval of articles using a vector database based on semantic similarity.
- Refinement of results via a knowledge graph and a controlled vocabulary (MeSH).
-
Context Poisoning Mitigation:
- Ensures that LLMs process only the most relevant and structured data, improving reliability.
-
Streamlined Workflow:
- Built using Streamlit for an intuitive user experience.
- Demonstrates a three-step pipeline:
- Search articles using vector similarity.
- Refine terms with the MeSH vocabulary and knowledge graph.
- Filter and summarize results with an LLM.
-
Scalability and Governance:
- Highlights the importance of structured metadata for scalable and real-world deployments.
To run this project locally, follow the steps below:
git clone https://github.com/Sa1f27/GraphRAG.git
cd GraphRAG
Create a .env
file in the root directory and include the following:
WCD_URL=<paste your Weaviate instance>
WCD_API_KEY=<paste your Weaviate API key>
OPENAI_API_KEY=<paste your OPENAI API key>
Get the PubMed MultiLabel Text Classification Dataset MeSH from Kaggle: Download Here.
Run the code in the notebook VectorVsKG_updated.ipynb
to process the data. This notebook generates the file PubMedGraph.ttl
. Place this file in the code folder containing the app.
Ensure all required Python dependencies are installed:
pip install -r requirements.txt
Start the application with the following command:
streamlit run app.py
- This application demonstrates the synergy of LLMs, knowledge graphs, and vector databases in solving real-world problems.
- Scalability and metadata governance ensure robust and reliable performance in production environments.