Skip to content

A Retrieval-Augmented Generation (RAG) system providing guideline answers, powered by local LLM via Ollama and exposed as a FastAPI.

Notifications You must be signed in to change notification settings

jvrbntz/simple-clinical-guideline-rag-system

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Simple RAG System for USPSTF Guidelines

⚠️ Important Disclaimer ⚠️

This repository and the RAG system implemented herein are developed strictly for educational and portfolio purposes only. It is intended to demonstrate technical capabilities in Retrieval-Augmented Generation (RAG) and the integration of local LLMs with authoritative guidelines.

This system is NOT intended for, nor should it be used for, clinical decision-making, medical diagnosis, treatment, or any form of patient care. Clinical decisions must always be made by qualified healthcare professionals based on their expertise, patient-specific information, and current, officially published medical guidelines.

The information provided by this system is for informational and demonstrative purposes only and should not be considered medical advice.


This project implements a Retrieval-Augmented Generation (RAG) system designed to provide clinicians with advice on patient management based on the United States Preventive Services Task Force (USPSTF) guidelines. It leverages local Large Language Models (LLMs) and embedding models via Ollama, ensuring data privacy and control.

Features

  • Document Ingestion: Processes PDF documents (USPSTF guidelines) into a searchable format.
  • Local Embeddings: Uses all-minilm via Ollama for generating document embeddings.
  • Local LLM: Utilizes phi3.5:latest via Ollama for generating responses, keeping all processing local.
  • Vector Store: Employs ChromaDB for efficient storage and retrieval of document chunks and their embeddings.
  • FastAPI Interface: Exposes the RAG system as a web API for easy integration and interaction.
  • Logging & Performance Metrics: Integrates Python's logging module and basic timing measurements for better observability.

Technologies Used

  • Python 3.11+
  • FastAPI: For building the web API.
  • Uvicorn: ASGI server to run the FastAPI application.
  • LangChain: Framework for developing LLM applications.
  • Ollama: For running local LLMs and embedding models (all-minilm, phi3.5:latest).
  • ChromaDB: Lightweight vector database.
  • python-dotenv: For managing environment variables.
  • unstructured: For extracting text from various document formats (e.g., PDFs).
  • uv: For dependency management and virtual environments.

Development Process

This project was developed using an iterative, AI-assisted approach. Leveraging gemini-cli, I guided the development process step-by-step, making architectural decisions, defining requirements, and ensuring adherence to best practices. This methodology allowed for rapid prototyping, exploration of various technical solutions, and a deeper understanding of complex concepts by actively directing the AI assistant's code generation and refactoring efforts. This approach highlights the ability to effectively utilize advanced AI tools as a force multiplier in software development.

Setup Instructions

Follow these steps to set up and run the project locally.

1. Clone the Repository

git clone <repository_url>
cd simple-rag-system

2. Install uv (if not already installed)

uv is a fast Python package installer and resolver. If you don't have it, install it:

curl -LsSf https://astral.sh/uv/install.sh | sh

3. Create and Activate Virtual Environment

uv venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

4. Install Dependencies

Install all required Python packages using uv:

uv sync

5. Install and Run Ollama

Download and install Ollama from the official website:

https://ollama.com/download

Once installed, pull the necessary models:

ollama pull all-minilm
ollama pull phi3.5:latest

Ensure the Ollama server is running in the background.

6. Place USPSTF Guidelines

Place your USPSTF guideline PDF files into the data/raw/ directory. If the directory does not exist, it will be created during the ingestion process.

Data Ingestion

Before querying, you need to ingest the documents into the vector database. This process extracts text, splits it into chunks, generates embeddings, and stores them in ChromaDB.

Run the ingestion script from the project root:

uv run python -m src.ingest

This will create a vectorstore/ directory containing your indexed data.

Running the FastAPI Application

Start the FastAPI server from the project root:

uv run python -m uvicorn main:app --host 0.0.0.0 --port 8000 --reload

The --reload flag is useful for development as it restarts the server on code changes.

API Usage

Once the server is running, open your web browser and navigate to:

http://127.0.0.1:8000/docs

This will open the interactive Swagger UI documentation, where you can test the API.

Querying the RAG System

Use the /query endpoint to send questions to the RAG system.

  • Endpoint: /query

  • Method: POST

  • Request Body (JSON):

    {
      "question": "What is the recommendation for colorectal cancer screening?"
    }
  • Response Body (JSON):

    {
      "question": "What is the recommendation for colorectal cancer screening?",
      "answer": "The USPSTF recommends..."
    }

Distinguishing as a Clinical RAG System

To evolve this into a truly clinical RAG system, key considerations include:

  • Authoritative Data: Strict curation and versioning of highly authoritative, evidence-based sources.
  • Domain-Specific Models: Utilizing embedding and language models (LLMs) specifically trained or fine-tuned on clinical texts for enhanced understanding.
  • Enhanced Retrieval: Leveraging metadata, hybrid search, and re-ranking for precise, context-aware information retrieval.
  • Clinical Prompt Engineering: Crafting prompts with guardrails to ensure factual, safe, and actionable responses, avoiding direct medical advice, and citing sources.
  • Rigorous Validation: Comprehensive evaluation by clinical experts to ensure accuracy, safety, and clinical utility.

Limitations and Future Improvements

This project serves as a foundational RAG system. Current limitations and areas for future improvement include:

Current Limitations

  • Document Parsing: Relies on basic PDF text extraction. Does not robustly handle complex document structures like tables, figures, or scanned documents, which can lead to information loss.
  • Chunking Strategy: Uses a simple recursive character splitting method. This might not always preserve semantic coherence perfectly, especially for clinical guidelines with intricate structures.
  • Retrieval Sophistication: Employs basic similarity search. Lacks advanced retrieval techniques such as re-ranking retrieved documents, hybrid search (combining keyword and semantic search), or leveraging document metadata for more precise filtering.
  • LLM Hallucination/Generality: While RAG reduces hallucination, the LLM might still generate less precise or overly general answers if the retrieved context is insufficient or ambiguous.
  • User Interface: Currently, interaction is limited to the FastAPI Swagger UI. A dedicated, user-friendly web interface is absent.
  • Evaluation Framework: Lacks a robust, automated evaluation pipeline to measure the RAG system's performance (e.g., relevance, faithfulness, latency) against a defined dataset.
  • Error Handling: Basic error handling is in place, but more granular and user-friendly error messages could be implemented.

Future Improvements

  • Advanced Document Processing: Implement more sophisticated parsing techniques (e.g., using unstructured.io's advanced features, or dedicated table/image extraction) to better handle complex PDF layouts and extract structured information.
  • Smarter Chunking: Explore and implement advanced chunking strategies (e.g., semantic chunking, hierarchical chunking based on document structure) to create more meaningful context units.
  • Enhanced Retrieval Techniques: Integrate re-ranking models (e.g., Cohere Rerank), implement HyDE (Hypothetical Document Embeddings), or RAG-Fusion for improved retrieval accuracy. Leverage document metadata (e.g., guideline year, disease, population) for filtered retrieval.
  • Domain-Specific Model Adaptation: Investigate fine-tuning embedding models (e.g., on clinical notes, medical literature) and potentially LLMs (if resources permit) to enhance domain-specific understanding and generation quality.
  • Interactive User Interface: Develop a simple web-based front-end (e.g., using Streamlit, Gradio, or a React/Vue app) for a more intuitive user experience.
  • Comprehensive Evaluation: Build an automated evaluation pipeline to continuously monitor and improve the RAG system's performance, including metrics for retrieval quality, answer faithfulness, and latency.
  • Guideline Versioning: Implement a system to manage and query specific versions of guidelines, ensuring answers are based on the most current or relevant iteration.
  • Citation Generation: Enhance the system to provide direct citations (e.g., page numbers, section references) from the source documents for generated answers, increasing trustworthiness.
  • Streaming Responses: Implement streaming for LLM responses to provide a more responsive user experience.

About

A Retrieval-Augmented Generation (RAG) system providing guideline answers, powered by local LLM via Ollama and exposed as a FastAPI.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages