This project implements a Retrieval-Augmented Generation (RAG) chatbot designed for analyzing Amazon's 10-K financial reports. The chatbot helps users efficiently obtain accurate, detailed insights from these documents.
Retrieval-Augmented Generation (RAG) is an approach that combines information retrieval with the generative capabilities of Large Language Models. It works in three simple steps:
- Retrieve: Extract documents from a knowledge base.
- Augment: Add relevant context to the user prompt.
- Generate: Generate an answer using the language model.
This method is particularly effective for domain-specific queries, ensuring responses are grounded in up-to-date and relevant information.
The chatbot integrates the following key components to ensure reliable performance:
- QDrant: Serves as the vector database for document indexing, retrieval, and semantic search.
- LangChain-Community: Handles chunk splitting, embeddings, and chat model integration.
- OpenAI Embeddings and Chat model: Powers answer generation.
- Gradio: Provides an easy-to-use interface for document uploads and question-answering sessions.
- RAGAS (Retrieval-Augmented Generation Assessment Score): Evaluates chatbot responses using detailed performance metrics.
The application is containerized with Docker, ensuring consistent deployment setup.
-
Document Upload and Indexing
- Upload PDF files, which are then split and embedded using
OpenAIEmbeddings
.
- Upload PDF files, which are then split and embedded using
-
Vector Store with QDrant
- Indexed documents are stored in QDrant collections, enabling fast semantic search.
-
Gradio UI
- User-friendly chat interface that allows you to:
- Select or create QDrant collections.
- Ask questions with real-time references to the source text.
- View retrieved text chunks, ensuring transparency on the source of each answer.
- User-friendly chat interface that allows you to:
-
RAGAS Evaluation
- For details on the RAGAS evaluation methodology and results, see the README.
-
Sentence-Window Retrieval
- Functionality: Splits documents into overlapping windows based on sentences rather than arbitrary chunks.
- Benefits: Enhances retrieval accuracy by preserving document context at the sentence level.
-
Auto-merging Retrieval
- Benefits: Dynamically merges relevant retrieved information, delivering coherent and context-rich answers.
-
Conversational Memory
- Benefits: Retains the context from previous interactions to support natural and continuous conversational flows.
-
Unit Tests
- Ensures core functionalities (document indexing, retrieval, and answer generation) work as expected.
Follow these steps to set up the project:
-
Clone the repository:
git clone https://github.com/rostyslavshovak/RAG-Retrieval-Augmented-Generation.git cd RAG-Retrieval-Augmented-Generation
-
Install required dependencies:
pip install --no-cache-dir -r requirements.txt
-
Create a
.env
file in the root directory:touch .env
Configure environment variables in a
.env
file (refer toexample.env
).OPENAI_API_KEY=sk-xxxxxx RAGAS_APP_TOKEN=apt.xxxxxx MODEL_NAME=gpt-3.5-turbo TEMPERATURE=0.0 MAX_TOKENS=500 EMBEDDING_MODEL=text-embedding-ada-002 HOST=localhost PORT=6333
-
Launch the chatbot locally:
python -m src.gradio_ui
- Access the application at http://localhost:7860/.
-
Run the application with Docker to ensure consistent performance:
docker-compose up --build
Access via http://localhost:7860/.
-
Interact with the Chatbot Use the interface to upload and index documents, then start interacting by asking specific questions
Note: Ensure you have created or selected a QDrant collection before querying.
-
Indexing a Document:
In the Gradio interface, click "Index a new PDF", upload your document, and specify the name for a new or existing QDrant collection. -
Asking Questions:
Navigate to the "Chatbot" tab, select your desired collection, and enter your questions. Relevant document chunks will appear in the Retrieved Chunks tab.
-