🔍 Code Search and Analysis System using Mutli-hop Retrieval Augmented Generation (RAG) for Strato Mercata

A semantic code search and analysis tool that indexes your codebase using embeddings and stores it in Weaviate, enabling LLM-powered queries over both code and documentation.

What is RAG?

Retrieval Augmented Generation (RAG) is a technique that uses a large language model (LLM) to generate text based on a given prompt. It is a type of generative AI that uses a large language model to generate text based on a given prompt.

What is Multi-hop RAG?

Multi-hop RAG is a technique that uses a large language model (LLM) to generate text based on a given prompt. THe LLM interacts with a vector DB askign for increasingly more context as needed.

How does it work?

The LLM is given a prompt and a vector DB. It then uses the prompt to search the vector DB for relevant context. It then uses the context to generate a response.

Note: Currently ASTs for edges are not done. This would make queries even more accurate.

🧰 Tech Stack

Weaviate: Vector database for semantic search
OpenAI: Embedding generation & LLM reasoning
Python: Backend for indexing, querying, and serving APIs
Docker + Docker Compose: Containerized infrastructure
Flask: Lightweight API server
Node.js + NVM: Frontend interface for search and visualization

⚙️ Prerequisites

Python 3.10+
Docker + Docker Compose
venv (or any Python virtual environment tool)
Node.js 21 (via NVM) for the frontend

📦 Installation

🔁 Clone the repository (including submodules for repo)

git clone https://github.com/davidnallapu/mercata-multi-hop-rag-llm.git
cd mercata-multi-hop-rag-llm
git submodule init
git submodule update

🧠 Indexing the Codebase

1. Set up Python backend

python3 -m venv .venv
source .venv/bin/activate  # or `.venv\Scripts\activate` on Windows
pip install -r requirements.txt

2. Start Weaviate vector DB

docker-compose up -d

Weaviate will be started with the text2vec-transformers vectorizer plugin.

3. Run the code indexer

This project uses the gpt-4-turbo OpenAI model, so please ensure that your API keys have access to that model.

Make sure you have the correct OpenAI API key in your .env file or global environment:

export OPENAI_API_KEY=your-api-key

python embed_codebase_weaviate.py

This will:

🧹 Clean existing data
🏗️ Initialize schema
🧾 Parse code using Tree-sitter
🔗 Extract call graphs and definitions
📚 Process documentation (e.g. Markdown, docx)
📡 Upload everything to Weaviate

🌐 Starting the Search Backend (API)

python query_codebase_weaviate2.py

Exposes an API at http://localhost:5000.

💻 Frontend Setup

cd frontend
npm install
nvm use 21
npm run dev

This will launch the frontend at http://localhost:3000 (or specified port).

📡 API Capabilities

Once the backend is running:

Search function and variable definitions semantically
Ask: “Where is function X used?”, “What calls Y?”, “What does Z mean?”
Retrieve and cross-link code, docs, and relationships

⚙️ Configuration

🔧 Weaviate Settings

Modify:

docker-compose.yml for container settings
Class schema, vectorizer, and module options

🧠 Model Configs

Tweak inside query_codebase_weaviate2.py or embed_codebase_weaviate.py

🛠️ Troubleshooting

🔍 Weaviate Issues

docker ps                        # ensure containers are up
docker logs mercata-llm3_weaviate_1

Check .env for valid OpenAI key
Ensure vector modules are enabled in Docker Compose

🧩 Indexing Issues

Tree-sitter must be installed and working
Watch logs for parser or schema errors

📜 License

MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

👨‍💻 Maintainer

David Samuel Nallapu – LinkedIn

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
blockapps-rest @ e944e7e		blockapps-rest @ e944e7e
frontend @ 4db718a		frontend @ 4db718a
other		other
strato-getting-started @ e0edc83		strato-getting-started @ e0edc83
strato-mercata-opensource @ c42b3a1		strato-mercata-opensource @ c42b3a1
tree-sitter-haskell @ 0975ef7		tree-sitter-haskell @ 0975ef7
tree-sitter-javascript @ 6fbef40		tree-sitter-javascript @ 6fbef40
tree-sitter-solidity @ d38dcd0		tree-sitter-solidity @ d38dcd0
.gary_history		.gary_history
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
check_weaviate.py		check_weaviate.py
code_graph.json		code_graph.json
docker-compose.yml		docker-compose.yml
embed_codebase_weaviate.py		embed_codebase_weaviate.py
ex.py		ex.py
query_codebase_weaviate2.py		query_codebase_weaviate2.py
requirements.txt		requirements.txt
weaviate.env.sample		weaviate.env.sample

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🔍 Code Search and Analysis System using Mutli-hop Retrieval Augmented Generation (RAG) for Strato Mercata

The LLM is given a prompt and a vector DB. It then uses the prompt to search the vector DB for relevant context. It then uses the context to generate a response.

🧰 Tech Stack

⚙️ Prerequisites

📦 Installation

🔁 Clone the repository (including submodules for repo)

🧠 Indexing the Codebase

1. Set up Python backend

2. Start Weaviate vector DB

3. Run the code indexer

🌐 Starting the Search Backend (API)

💻 Frontend Setup

📡 API Capabilities

⚙️ Configuration

🔧 Weaviate Settings

🧠 Model Configs

🛠️ Troubleshooting

🔍 Weaviate Issues

🧩 Indexing Issues

📜 License

👨‍💻 Maintainer

📚 References

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

davidnallapu/mercata-multi-hop-rag-llm

Folders and files

Latest commit

History

Repository files navigation

🔍 Code Search and Analysis System using Mutli-hop Retrieval Augmented Generation (RAG) for Strato Mercata

The LLM is given a prompt and a vector DB. It then uses the prompt to search the vector DB for relevant context. It then uses the context to generate a response.

🧰 Tech Stack

⚙️ Prerequisites

📦 Installation

🔁 Clone the repository (including submodules for repo)

🧠 Indexing the Codebase

1. Set up Python backend

2. Start Weaviate vector DB

3. Run the code indexer

🌐 Starting the Search Backend (API)

💻 Frontend Setup

📡 API Capabilities

⚙️ Configuration

🔧 Weaviate Settings

🧠 Model Configs

🛠️ Troubleshooting

🔍 Weaviate Issues

🧩 Indexing Issues

📜 License

👨‍💻 Maintainer

📚 References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages