This repository provides a hands-on implementation of Retrieval-Augmented Generation (RAG) using Hugging Face LLMs, LangChain, and FAISS vector databases. The project demonstrates how to build an AI-powered chatbot capable of retrieving relevant context from documents and generating responses using Mistral-7B. π
- Document Processing: Load and chunk PDF documents for retrieval.
- Vector Database (FAISS): Store embeddings for efficient similarity search.
- Hugging Face LLM Integration: Use Mistral-7B for intelligent responses.
- Custom Prompt Engineering: Ensure accurate, context-aware answers.
- Interactive Chatbot UI: Built with Streamlit for easy interaction.
Ensure you have the following installed before proceeding:
Clone the repository and install dependencies:
git clone https://github.com/Data-Science-and-Analytics-Club/RAG-with-VectorDatabase-Event.git
cd RAG-with-VectorDatabase-Event
pip install -r requirements.txt
Set up your Hugging Face API token:
export HF_TOKEN='your_huggingface_api_token'
First, we need to import the required modules for document processing, text chunking, and embedding generation.
from langchain_community.document_loaders import PyPDFLoader, DirectoryLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
The load_pdfs
function loads PDF files from the specified directory using LangChain's DirectoryLoader and PyPDFLoader.
def load_pdfs(data):
loader = DirectoryLoader(data, glob="*.pdf", loader_cls=PyPDFLoader)
return loader.load()
Since LLMs work better with smaller context windows, we split documents into manageable chunks using RecursiveCharacterTextSplitter
.
def create_chunks(data):
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
return text_splitter.split_documents(data)
Embeddings help in semantic search and retrieval. We use Hugging Face's MiniLM model to generate vector embeddings.
def get_embedding_model():
return HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
Finally, we load the documents, create text chunks, generate embeddings, and store them in a FAISS vector database.
documents = load_pdfs("context/")
text_chunks = create_chunks(documents)
embedding_model = get_embedding_model()
DB_PATH = "vectorstore/db_faiss"
db = FAISS.from_documents(text_chunks, embedding_model)
db.save_local(DB_PATH)
Next, we will cover how to load an LLM and retrieve relevant document chunks for answering user queries! π―
This project implements a retrieval-augmented generation (RAG) system using LangChain
and HuggingFace
. It leverages a FAISS vector store for efficient document retrieval and a Hugging Face hosted model (Mistral-7B-Instruct-v0.3
) for generating responses. The pipeline follows these key steps:
- Import necessary libraries
- Load environment variables
- Initialize the Hugging Face model endpoint
- Define a custom prompt template
- Load a FAISS vector store for document retrieval
- Create retrieval and document processing chains
- Build the complete retrieval-augmented pipeline
- Define a function to handle user queries
To set up our pipeline, we import the required libraries:
from langchain_huggingface import HuggingFaceEndpoint
from langchain_core.prompts import PromptTemplate
from langchain.chains.retrieval_qa.base import RetrievalQA
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains.retrieval import create_retrieval_chain
from langchain_huggingface import HuggingFaceEmbeddings
import os
from dotenv import load_dotenv
from langchain_community.vectorstores import FAISS
These libraries enable:
- LLM integration (
HuggingFaceEndpoint
) - Prompt customization (
PromptTemplate
) - Retrieval-augmented QA pipeline (
RetrievalQA
) - Document combination strategies (
create_stuff_documents_chain
) - Vector store retrieval (
FAISS
) - Environment variable handling (
dotenv
)
We use dotenv
to load sensitive credentials, such as the Hugging Face API token:
load_dotenv()
HF_TOKEN = os.environ.get("HF_TOKEN")
HUGGINGFACE_REPO_ID = "mistralai/Mistral-7B-Instruct-v0.3"
Ensure you have a .env
file with:
HF_TOKEN=your_huggingface_api_key
We define a function to initialize the HuggingFaceEndpoint
with the required parameters:
def load_llm(huggingface_repo_id):
llm = HuggingFaceEndpoint(
repo_id=huggingface_repo_id,
temperature=0.5,
model_kwargs={"token": HF_TOKEN, "max_length": "512"}
)
return llm
temperature=0.5
: Balances randomness and determinism in responses.max_length=512
: Limits the response length.
A prompt template ensures that responses stay within the provided context:
CUSTOM_PROMPT_TEMPLATE = """
Use the information in context to answer the user's question.
Strictly stay within the context and do not provide answers to things you do not know.
Do not make up answers. If you do not know the answer, say "I do not know".
Do not provide anything outside the given context.
Context: {context}
Question: {input}
Be extensive and accurate.
"""
def set_custom_prompt(custom_prompt_template):
prompt = PromptTemplate(template=custom_prompt_template, input_variables=["context", "input"])
return prompt
We load a pre-built FAISS vector store, which contains embeddings of documents for efficient retrieval:
DB_PATH = "vectorstore/db_faiss"
embedding_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
db = FAISS.load_local(DB_PATH, embedding_model, allow_dangerous_deserialization=True)
sentence-transformers/all-MiniLM-L6-v2
: Embedding model used to encode text into vector space.FAISS.load_local()
: Loads the vector store from disk.
The retriever fetches relevant documents based on the query:
retriever = db.as_retriever(search_kwargs={'k': 3})
k=3
: Retrieves the top 3 most relevant documents.
We initialize the LLM and prompt, then create the document processing chain:
llm = load_llm(HUGGINGFACE_REPO_ID)
prompt_template = set_custom_prompt(CUSTOM_PROMPT_TEMPLATE)
qa_document_chain = create_stuff_documents_chain(llm, prompt_template)
We combine the retriever and document processing chain into a complete retrieval-augmented generation (RAG) pipeline:
qa_chain = create_retrieval_chain(retriever, qa_document_chain)
This function processes user queries and returns an answer:
def get_response(user_query):
response = qa_chain.invoke({"input": user_query})
return response['answer']
user_query = "What is LangChain?"
response = get_response(user_query)
print(response)
This project implements a RAG-based chatbot using:
- FAISS for document retrieval
- Hugging Face models for text generation
- LangChain for pipeline management
- A structured prompt template for better responses
To run the project:
- Install dependencies (
pip install langchain faiss-cpu transformers sentence-transformers
) - Add your Hugging Face token to
.env
- Load the FAISS vector store with relevant data
- Call
get_response(user_query)
to get an AI-generated response.