RAG Book Question-Answering System

A Retrieval-Augmented Generation (RAG) system for intelligent book analysis and question answering, built with Flask and OpenAI. The system processes documents, creates embeddings, and uses RAG to generate accurate answers to questions about the uploaded texts.

Features

Document Processing: Support for multiple formats (PDF, DOCX, TXT, ODT)
Real-time Progress: WebSocket-based progress updates during processing
Advanced Text Analysis:
- Named entity recognition
- Date extraction
- Key phrase identification
Hybrid Search:
- Semantic search using embeddings
- Lexical search with BM25
- Query expansion with synonyms
Caching System: Efficient storage for embeddings and responses
Web Interface: Clean UI with Markdown support for responses
CLI Interface: Command-line interface for direct interaction

Technical Stack

Backend

Flask + Flask-SocketIO for web server
OpenAI API for embeddings and text generation
Pinecone for vector storage
NLTK & spaCy for text processing

Frontend

jQuery for AJAX requests
Marked.js for Markdown rendering
Font Awesome for icons
WebSocket for real-time updates

Installation

Clone the repository
Create a virtual environment
Install the required packages:
```
pip install -r requirements.txt
```

Set up environment variables in a .env file:

OPENAI_API_KEY=your_openai_api_key
PINECONE_API_KEY=your_pinecone_api_key
FLASK_SECRET_KEY=your_flask_secret_key
ADMIN_PASSWORD=your_admin_password
TESTER_PASSWORD=your_tester_password

Run the application:
```
flask run
```
Access the web interface at http://localhost:5000

Usage

Upload documents through the web interface to start processing.
Use the CLI for direct interaction and testing of the system's capabilities.

Contributing

Contributions are welcome! Please submit a pull request or open an issue for any enhancements or bug fixes.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 132 Commits
.github/workflows		.github/workflows
.venv		.venv
data		data
docs		docs
rag_book_assistant.egg-info		rag_book_assistant.egg-info
rag_book_qa.egg-info		rag_book_qa.egg-info
src		src
tests		tests
uploads		uploads
.DS_Store		.DS_Store
.coverage		.coverage
.cursorignore		.cursorignore
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
Procfile		Procfile
README.md		README.md
book_embeddings_cache.pkl		book_embeddings_cache.pkl
docker-compose.yml		docker-compose.yml
embeddings_cache.pkl		embeddings_cache.pkl
file_hashes.txt		file_hashes.txt
ford.txt		ford.txt
rag-project-6fbb6-firebase-adminsdk-oiud2-dfb97a2d08.json		rag-project-6fbb6-firebase-adminsdk-oiud2-dfb97a2d08.json
requirements.in		requirements.in
requirements.txt		requirements.txt
set_env.sh		set_env.sh
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RAG Book Question-Answering System

Features

Technical Stack

Backend

Frontend

Installation

Usage

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

izgorodin/rag_book_assistant

Folders and files

Latest commit

History

Repository files navigation

RAG Book Question-Answering System

Features

Technical Stack

Backend

Frontend

Installation

Usage

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages