A powerful document question-answering application featuring advanced Retrieval-Augmented Generation (RAG) capabilities with Google's Gemini 2.0 Flash model.
- Relevance Evaluation: Automatically evaluates how relevant each context chunk is to the query
- Context Filtering: Removes less relevant information to improve response quality
- Sufficiency Analysis: Determines if the retrieved context is sufficient to answer the query
- Adaptive Retrieval: Retrieves additional context when needed
- Query Reformulation: Transforms user queries for more effective retrieval
- Iterative Analysis: Multiple rounds of context analysis and improvement
- Follow-up Query Generation: Generates specific queries to fill information gaps
- Context Synthesis: Creates optimized context by combining and reorganizing information
- Supports PDF, DOCX, and TXT documents
- Automatically chunks documents for improved retrieval
- Semantic search using Google's embedding model
- Flask Web Application: Lightweight web interface with responsive design
- Modular Components: Separate modules for document processing, embedding, retrieval, and generation
- In-Memory Storage: Session-based storage for document embeddings
- Gemini 2.0 Flash: Leverages Google's latest LLM for intelligent RAG operations
app.py
: The main Flask application file. Defines routes for:/
: Homepage, renders theindex.html
template./upload
: Handles document uploads, processes documents, creates embeddings, and saves them./query
: Handles user queries, retrieves relevant context using either Self-RAG or Agentic RAG, generates responses using Gemini, and returns responses along with source information and RAG metrics.
main.py
: Entry point to run the Flask application.pyproject.toml
: Project configuration file, including dependencies./utils
: Core RAG functionalityagentic_rag.py
: Autonomous RAG agent implementationdocument_processor.py
: Document parsing and chunkingembedding.py
: Document and query embedding functionsgemini_integration.py
: Integration with Gemini modelsretrieval.py
: Semantic search functionality
/static
: Frontend assets/css
: Stylesheets/js
: JavaScript files
/templates
: HTML templates
- Python 3.11+
- A valid Google API key for Gemini API access
-
Clone the repository:
git clone https://github.com/yourusername/GeminiRagAssistant.git cd GeminiRagAssistant
-
Set up a virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Set environment variables:
export GOOGLE_API_KEY="your_google_api_key_here" export SESSION_SECRET="a_secure_random_string"
On Windows:
set GOOGLE_API_KEY=your_google_api_key_here set SESSION_SECRET=a_secure_random_string
The application requires the following Python libraries:
- Flask: Web framework
- Google Generative AI: Gemini API access
- PyPDF2: PDF processing
- docx2txt: DOCX processing
- NumPy: Numerical operations
- werkzeug: For utility functions for web applications
You can install these dependencies using pip:
pip install Flask google-generativeai PyPDF2 docx2txt numpy werkzeug
Run the application with:
python main.py
The application will be available at http://localhost:5000
-
Upload a Document:
- Click on the "Upload Documents" section
- Select a document (PDF, DOCX, or TXT format)
- Wait for processing (document will be chunked and embedded)
-
Ask Questions:
- Type your question in the query box
- Select your preferred RAG mode:
- Self-RAG: Faster with real-time relevance filtering
- Agentic RAG: More thorough with iterative improvements
- Click "Ask" and wait for the response
-
View the Response:
- The answer will be displayed in the response section
- You can see which sources were used and their relevance
- For Agentic RAG, you'll see additional metrics like context quality and follow-up queries
Example Outputs:
- User uploads document and asks a question
- System retrieves initial context chunks based on semantic similarity
- Each chunk is evaluated for relevance to the query
- Low-relevance chunks are filtered out
- System analyzes if the filtered context is sufficient
- If needed, additional context is retrieved
- Final response is generated using the optimized context
- User uploads document and asks a question
- System reformulates the query to improve retrieval
- Initial context chunks are retrieved
- System analyzes context quality and identifies gaps
- Context chunks are prioritized by relevance
- System generates follow-up queries to fill gaps
- Additional context is retrieved using follow-up queries
- Context is synthesized into optimized form
- Final response is generated with detailed process metrics
- Built with Google's Gemini 2.0 Flash model
- Inspired by research on Self-RAG and Agentic RAG approaches