RAG Chatbot for Amazon 10-K Financial Analysis

This project implements a Retrieval-Augmented Generation (RAG) chatbot designed for analyzing Amazon's 10-K financial reports. The chatbot helps users efficiently obtain accurate, detailed insights from these documents.

What is RAG?

Retrieval-Augmented Generation (RAG) is an approach that combines information retrieval with the generative capabilities of Large Language Models. It works in three simple steps:

Retrieve: Extract documents from a knowledge base.
Augment: Add relevant context to the user prompt.
Generate: Generate an answer using the language model.

This method is particularly effective for domain-specific queries, ensuring responses are grounded in up-to-date and relevant information.

Project Overview

The chatbot integrates the following key components to ensure reliable performance:

QDrant: Serves as the vector database for document indexing, retrieval, and semantic search.
LangChain-Community: Handles chunk splitting, embeddings, and chat model integration.
OpenAI Embeddings and Chat model: Powers answer generation.
Gradio: Provides an easy-to-use interface for document uploads and question-answering sessions.
RAGAS (Retrieval-Augmented Generation Assessment Score): Evaluates chatbot responses using detailed performance metrics.

The application is containerized with Docker, ensuring consistent deployment setup.

Key Features

Document Upload and Indexing
- Upload PDF files, which are then split and embedded using OpenAIEmbeddings.
Vector Store with QDrant
- Indexed documents are stored in QDrant collections, enabling fast semantic search.
Gradio UI
- User-friendly chat interface that allows you to:
  - Select or create QDrant collections.
  - Ask questions with real-time references to the source text.
  - View retrieved text chunks, ensuring transparency on the source of each answer.
RAGAS Evaluation
- For details on the RAGAS evaluation methodology and results, see the README.
Sentence-Window Retrieval
- Functionality: Splits documents into overlapping windows based on sentences rather than arbitrary chunks.
- Benefits: Enhances retrieval accuracy by preserving document context at the sentence level.
Auto-merging Retrieval
- Benefits: Dynamically merges relevant retrieved information, delivering coherent and context-rich answers.
Conversational Memory
- Benefits: Retains the context from previous interactions to support natural and continuous conversational flows.
Unit Tests
- Ensures core functionalities (document indexing, retrieval, and answer generation) work as expected.

Installation

Follow these steps to set up the project:

Clone the repository:

git clone https://github.com/rostyslavshovak/RAG-Retrieval-Augmented-Generation.git
cd RAG-Retrieval-Augmented-Generation

Install required dependencies:

pip install --no-cache-dir -r requirements.txt

Create a .env file in the root directory:

touch .env

Configure environment variables in a .env file (refer to example.env).

OPENAI_API_KEY=sk-xxxxxx
RAGAS_APP_TOKEN=apt.xxxxxx

MODEL_NAME=gpt-3.5-turbo
TEMPERATURE=0.0
MAX_TOKENS=500
EMBEDDING_MODEL=text-embedding-ada-002

HOST=localhost
PORT=6333

Usage

Local Setup:

Launch the chatbot locally:
```
python -m src.gradio_ui
```
- Access the application at http://localhost:7860/.

Docker Setup:

Run the application with Docker to ensure consistent performance:
```
docker-compose up --build
```
Access via http://localhost:7860/.
Interact with the Chatbot Use the interface to upload and index documents, then start interacting by asking specific questions

Note: Ensure you have created or selected a QDrant collection before querying.
- Indexing a Document:
  In the Gradio interface, click "Index a new PDF", upload your document, and specify the name for a new or existing QDrant collection.
- Asking Questions:
  Navigate to the "Chatbot" tab, select your desired collection, and enter your questions. Relevant document chunks will appear in the Retrieved Chunks tab.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
images		images
src		src
tests		tests
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
example.env		example.env
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RAG Chatbot for Amazon 10-K Financial Analysis

Table of Contents

What is RAG?

Project Overview

Key Features

Installation

Usage

Local Setup:

Docker Setup:

Enjoy the Chatbot!

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

rostyslavshovak/RAG-Retrieval-Augmented-Generation

Folders and files

Latest commit

History

Repository files navigation

RAG Chatbot for Amazon 10-K Financial Analysis

Table of Contents

What is RAG?

Project Overview

Key Features

Installation

Usage

Local Setup:

Docker Setup:

Enjoy the Chatbot!

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages