Skip to content

๐Ÿ” Multi-Agent Documentation Scraper with Semantic Search ๐Ÿš€ Automated documentation scraping using collaborative AI agents (web search + extraction) ๐Ÿ“ฆ Powered by ChromaDB, DuckDuckGo, and Gradio ๐Ÿค– Features vector embeddings, tool-calling agents, and semantic search

Notifications You must be signed in to change notification settings

vaibhavgitt/Smolagents-systems

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

4 Commits
ย 
ย 
ย 
ย 

Repository files navigation

Demo

Screen.Recording.2025-01-26.180241.mp4

Screenshots

Image

Smolagents-systems

๐Ÿ” Multi-Agent Documentation Scraper with Semantic Search

๐Ÿš€ Automated documentation scraping using collaborative AI agents (web search + extraction)

๐Ÿ“ฆ Powered by ChromaDB, DuckDuckGo, and Gradio ๐Ÿค– Features vector embeddings, tool-calling agents, and semantic search

Multi-Agent Documentation Scraper

Python 3.8+ License: MIT

An AI-powered system for automated documentation scraping and semantic search using collaborative agents.

Demo Screenshot

Features

  • ๐Ÿค– Two specialized agents: Web Researcher + Documentation Extractor
  • ๐Ÿ” Semantic search using ChromaDB vector database
  • ๐ŸŒ DuckDuckGo integration for web searches
  • ๐ŸŽฎ Gradio web interface for easy interaction

Quick Start

Google Colab

  1. Open this Colab notebook
  2. Run these commands:
!pip install chromadb gradio duckduckgo-search transformers torch
!git clone https://github.com/vaibhavgitt/Smolagents-systems
%cd Smolagents-systems

bash

python -m venv venv
source venv/bin/activate  # Linux/MacOS
.\venv\Scripts\activate   # Windows
Install dependencies:

bash

pip install -r requirements.txt
Run the application:

bash

python main.py
Requirements
Python 3.8+

requirements.txt:

chromadb>=0.4.0
gradio>=3.50.0
duckduckgo-search>=3.8.6
transformers>=4.30.0
torch>=2.0.0
Usage Example
Start the Gradio interface

In "Documentation Scraper" tab: Example

Library Name: PyTorch

Search Query: tensor operations

Click "Scrape Documentation"

Switch to "Documentation Q&A" tab to search stored docs

Example output:

### Scraped Documentation
PyTorch Tensor Operations Guide...
**Sources**: [pytorch.org/docs/stable/tensors.html]
How It Works
mermaid

graph TD
    A[User Query] --> B(Web Search Agent)
    B --> C{Found Resources?}
    C -->|Yes| D(Doc Extraction Agent)
    D --> E[ChromaDB Storage]
    E --> F[Semantic Search]
    C -->|No| G[Error Handling]
Configuration
Add to .env file:

HF_API_TOKEN=your_huggingface_token  # Optional
CHROMA_DB_PATH=./chroma_db
Troubleshooting
Port conflicts: Try python main.py --port 7861
Missing dependencies: Run pip install -r requirements.txt --force-reinstall

Chromadb errors: Delete chroma_db folder and restart

About

๐Ÿ” Multi-Agent Documentation Scraper with Semantic Search ๐Ÿš€ Automated documentation scraping using collaborative AI agents (web search + extraction) ๐Ÿ“ฆ Powered by ChromaDB, DuckDuckGo, and Gradio ๐Ÿค– Features vector embeddings, tool-calling agents, and semantic search

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages