Skip to content

A powerful documentation QA system that crawls help websites, processes content, and provides accurate answers using RAG (Retrieval-Augmented Generation) with Google's Gemini AI.

License

Notifications You must be signed in to change notification settings

rajeshmore1/Pulse-WebQA_Agent

Repository files navigation

🔍 Pulse - Help Website Q&A Agent

Pulse-WebQA Agent Logo

A powerful documentation QA system that crawls help websites, processes content, and provides accurate answers using RAG (Retrieval-Augmented Generation) with Google's Gemini AI.

Python Packages For Building QA Agent

Static Badge Static Badge Static Badge Static Badge Static Badge Static Badge Static Badge

Video Tutorial

IMAGE ALT TEXT HERE

Table of Contents

  1. Flow Diagram / Architecture
  2. Storage Structure
  3. Features
  4. Installation
  5. Usage
  6. Contributing

Flow Diagram / Architecture

Storage Structure

Pulse-WebQA_Agent/
├──notebooks
├── text/                    # Raw crawled content
│   └── domain.com/
│       ├── page1.txt
│       └── page2.txt
├── processed/              # Processed content
│   └── scraped.csv
├── chroma_db/             # Vector database
│   ├── index/
│   └── embeddings/
└── src
    ├── app.py
    ├── crawler.py
    ├── processor.py
    └── qa_system.py
├── requirements.txt
├── run.py
├── docker-compose.yml
├── Dockerfile
└── setup.py

Features

🌐 Smart Web Crawling

  • Configurable depth and page limits
  • Intelligent URL filtering
  • Progress tracking

📑 Content Processing

  • Removes irrelevant elements (navigation, footers)
  • Preserves document hierarchy
  • Handles multiple content types

🧠 Advanced RAG System

  • Google's Gemini AI integration
  • Semantic search capabilities
  • Context-aware responses

💾 Extensible Storage

  • Chromadb vector database
  • Supports appending new content
  • Efficient retrieval

Installation

  1. Clone the repository:
git clone https://github.com/rajeshmore1/Pulse-WebQA_Agent.git
cd Pulse-WebQA_Agent
  1. Create a virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt

  1. Set up environment variables:
echo "GOOGLE_API_KEY=your_api_key_here" > .env

Usage

Starting the Application

python app.py

Running Unit Test

python -m unittest test_crawler.py -v

Web Interface

image

Containerisation

Build the Docker image:

docker build -t pulse-qa .

Run using Docker Compose:

# Create .env file with your API key
echo "GOOGLE_API_KEY=your_api_key_here" > .env

# Start the application
docker-compose up

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Commit changes
  4. Open a pull request

About

A powerful documentation QA system that crawls help websites, processes content, and provides accurate answers using RAG (Retrieval-Augmented Generation) with Google's Gemini AI.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published