RAGBase4Code

A advanced Retrieval Augmented Generation framework designed to enhance your AI applications with multi-source knowledge retrieval. Specifically designed for offline LLM4CODE, aiming to provide codebase understanding offline and securely.

Author

This codebase is designed and maintained by Json Zhou (zzhou292@wisc.edu).

Overview

RAGBase4Code provides a modular, extensible solution for building RAG systems with multiple retrieval agents targeting different knowledge sources. By leveraging vector databases and optimized embeddings, RAGBase delivers highly relevant context to your LLM, resulting in more accurate, knowledge-grounded responses.

Yes, you can understand this as a backbone of a secured Cursor AI (https://www.cursor.com/) or Trae (https://www.trae.ai/).

System Phases

Data Preparation & Indexing
Process various data sources into searchable vector embeddings.
Retrieval & Generation
Dynamically fetch relevant information and generate responses when users ask questions.

Key Features

Triple-Agent Retrieval System
Specialized agents for source code, documentation, and conversation history.
Multiple LLM Interfaces
Seamless integration with NVIDIA NIM services and Ollama for local model deployment.
Fully Customizable Pipelines
Configure every aspect of the RAG workflow from chunking to prompt engineering.
Scalable Architecture
Built to handle everything from personal projects to enterprise-scale applications.
Privacy-Focused
Process sensitive data locally without external API calls when using Ollama integration.
Comprehensive Evaluation
Built-in metrics for measuring retrieval quality and response accuracy.

Architecture

RAGBase follows a modular architecture with separate components for:

Document Processing
Extract, clean, and chunk text from various sources.
Vector Embedding
Convert text into numerical representations using configurable embedding models.
Vector Storage
Index and store embeddings for efficient similarity search.
Retrieval Agents
Specialized components to extract relevant information based on query context.
LLM Interface
Connect to either NVIDIA NIM services or local Ollama models.
Response Generator
Combine retrieved information with the query to produce accurate answers.

Retrieval Agents

RAGBase4Code implements three specialized retrieval agents:

Source Code Retriever
Intelligently navigates codebases to extract relevant functions, classes, and implementation details. Optimized for understanding code semantics beyond simple keyword matching.
README Doc Retriever
Specializes in extracting high-level project information, setup instructions, and usage examples from documentation. Particularly valuable for understanding project architecture and intent.
Conversation Document Retriever
Maintains and indexes past interactions, allowing the system to reference previous questions and answers for improved context awareness and continuity.

Each agent can operate independently or together, with the system dynamically determining which knowledge sources are most relevant for each query.

Instructions

Prerequisites & Installations

Ollama (for local LLM deployment):
• If you have not already installed Ollama, download the appropriate installer from the official Ollama website.
• After installing, you can start the background service with:
```
'ollama serve'
```
(You may adjust or disable autostart settings based on your preferences.)
Sentence Transformers (for generating embeddings):
Install the Sentence Transformers library using pip:

pip3 install -U sentence-transformers

Qdrant (via Docker, for vector storage and similarity search):
Make sure Docker is installed and running on your system. Then launch Qdrant in the background with:

docker run -p 6333:6333 -p 6334:6334 -v "${PWD}/qdrant_storage:/qdrant/storage:z" qdrant/qdrant

This command ensures that Qdrant runs with persistent storage mapped to the qdrant_storage directory in your current path.

Running the Pipeline

Generate Embeddings:
Run the script to process your data and generate vector embeddings:

python generate_embeddings.py

Query the System:
After generating embeddings, choose the appropriate script based on your LLM interface:

For NVIDIA NIM services:
```
python load_and_query_nim.py
```
For local Ollama deployment:
```
python load_and_query_ollama.py
```

Follow these steps sequentially to set up your environment and execute the full RAG pipeline.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
doc/images		doc/images
example_codebase		example_codebase
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
generate_embeddings.py		generate_embeddings.py
load_and_query_nim.py		load_and_query_nim.py
load_and_query_ollama.py		load_and_query_ollama.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RAGBase4Code

Author

Overview

System Phases

Key Features

Architecture

Retrieval Agents

Instructions

Prerequisites & Installations

Running the Pipeline

About

Uh oh!

Releases

Packages

Languages

License

zzhou292/RAGBase4Code

Folders and files

Latest commit

History

Repository files navigation

RAGBase4Code

Author

Overview

System Phases

Key Features

Architecture

Retrieval Agents

Instructions

Prerequisites & Installations

Running the Pipeline

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages