A modular Python application that combines Retrieval-Augmented Generation (RAG), FAISS vector storage, and LangChain for intelligent query processing across multiple data sources.
-
Multi-Modal Query Processing: Handles three types of queries:
- General Knowledge Queries
- Project-Specific Queries (using RAG)
- Employee Database Queries (using SQL)
-
Intelligent Query Classification: Automatically categorizes queries using LLM
-
Vector-Based Knowledge Retrieval: Uses FAISS for efficient similarity search
-
Modular Architecture: Clean separation of concerns with dedicated modules
The system is built with a modular architecture consisting of:
RAG-FAISS-LANGCHAIN/
├── agents/ # Query processing agents
│ ├── __init__.py
│ └── agent_manager.py # Contains QueryClassifier, SQLAgent, RAGAgent
├── config/ # Configuration management
│ ├── __init__.py
│ └── config.py # Central configuration settings
├── database/ # Database operations
│ ├── __init__.py
│ └── db_manager.py # SQLite database management
├── embeddings/ # Vector embeddings
│ ├── __init__.py
│ └── embeddings_manager.py # FAISS and embeddings handling
├── knowledge-base/ # Knowledge storage
│ └── Projects.txt # Project documentation
├── main.py # Application entry point
└── requirements.txt # Project dependencies
- Python 3.9+
- pip3
- Virtual environment (recommended)
- Clone the repository:
git clone [repository-url]
cd RAG-FAISS-LANGCHAIN
- Create and activate a virtual environment:
python3 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies:
pip3 install -r requirements.txt
Run the main application:
python3 main.py
Our system handles three types of queries, each processed by specialized agents:
Example of a general knowledge query being processed by the system
Example of a project-specific query using RAG for context-aware responses
Example of an employee-related query using SQL conversion
Example queries:
# General Knowledge Query
"Why is global warming on the rise?"
# Project Query
"Have we undertaken any projects related to robotics?"
# Employee Query
"Who are the employees with more than three years of experience?"
Key configuration settings in config/config.py
:
# LLM Settings
model_name: str = "llama2"
temperature: float = 0
# Embedding Settings (see Hugging Face Integration section for more options)
embedding_model: str = "sentence-transformers/all-MiniLM-L6-v2"
device: str = "cuda" if torch.cuda.is_available() else "cpu" # Optional GPU support
# FAISS Settings
retriever_k: int = 5 # Number of documents to retrieve
retriever_fetch_k: int = 8 # Number of documents to fetch before filtering
# See the Hugging Face Integration section for advanced configuration options
For detailed model configurations and troubleshooting, refer to the Hugging Face Integration section above.
This project leverages LangChain's powerful components and abstractions:
- LLM Integration
from langchain_ollama import OllamaLLM
llm = OllamaLLM(model="llama2", temperature=0)
- Retrieval Components
from langchain.chains import RetrievalQA
from langchain_community.vectorstores import FAISS
- Prompt Management
from langchain.prompts import PromptTemplate
- RetrievalQA Chain
- Combines document retrieval with LLM for question answering
- Uses FAISS for efficient similarity search
- Custom prompt templates for consistent responses
- Custom Chain Components
- Specialized agents for different query types
- Modular design following LangChain patterns
- Extensible architecture for new capabilities
- Query Classification using LLM
- Routing to appropriate agent
- Document retrieval (if needed)
- Response generation and formatting
Our system employs multiple specialized agents, each handling specific types of queries:
- Purpose: Routes queries to appropriate processing agents
- Capabilities:
- Identifies query type (General, Project, Employee)
- Uses LLM for classification
- Handles edge cases and ambiguous queries
- Implementation:
class QueryClassifier: def classify_query(self, query): # Intelligent query classification
- Purpose: Handles project-related queries
- Capabilities:
- Document retrieval using FAISS
- Context-aware response generation
- Structured response formatting
- Key Features:
- Uses custom prompt templates
- Handles multiple document contexts
- Performance metrics tracking
- Purpose: Database query generation and execution
- Capabilities:
- Natural language to SQL conversion
- Query optimization
- Result formatting and presentation
- Features:
- Schema-aware query generation
- Error handling and validation
- Rich response formatting
- Purpose: Handles general knowledge queries
- Capabilities:
- Direct LLM interaction
- Fact-based response generation
- Features:
- No context dependency
- Broad knowledge base access
User Query → Query Classifier → Appropriate Agent → Response Generation → Formatted Output
This project integrates with Hugging Face's ecosystem in several ways:
We use Hugging Face's sentence-transformers for generating embeddings:
from langchain_community.embeddings import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(
model_name="sentence-transformers/all-MiniLM-L6-v2"
)
To use a different Hugging Face model:
- Update the config.py file:
embedding_model: str = "your-preferred-model" # e.g., "bert-base-uncased"
- Install additional requirements if needed:
pip install transformers torch
- The embeddings manager will automatically use your specified model:
embeddings_manager = EmbeddingsManager(
model_name=config.embedding_model
)
- Access to state-of-the-art embedding models
- Easy model switching and experimentation
- Consistent API across different models
- Offline capability with downloaded models
- First-time model loading may take longer
- Consider GPU acceleration for better performance
- Models can be cached for faster subsequent use
Common issues and solutions:
- Model Download Issues
# Set HF_HOME environment variable
export HF_HOME=/path/to/huggingface
# Force model download
python3 -c "from huggingface_hub import snapshot_download; snapshot_download('sentence-transformers/all-MiniLM-L6-v2')"
- Memory Issues
- Use smaller models (e.g., MiniLM instead of MPNet)
- Enable gradient checkpointing
- Consider CPU-only inference for large models
- Performance Optimization
# In config.py, add device configuration
device: str = "cuda" if torch.cuda.is_available() else "cpu"
# In embeddings_manager.py, use device setting
embeddings = HuggingFaceEmbeddings(
model_name=model_name,
model_kwargs={'device': config.device}
)
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- LangChain for the powerful LLM framework
- FAISS for efficient vector storage
- Sentence Transformers for embeddings
- Hugging Face for state-of-the-art models and transformers
- Ollama for local LLM support