URL Reader Assistant is a powerful documentation analysis tool that combines web crawling capabilities with local Large Language Models (LLMs) through Ollama. The system efficiently processes web content and provides an interactive question-answering interface using Retrieval-Augmented Generation (RAG) technology.
The assistant specializes in crawling documentation websites, processing their content through local language models, and creating an intelligent knowledge base that users can query. By leveraging local LLM processing, it offers both privacy and cost-effectiveness while maintaining high-quality responses with source citations.
- Multi-threaded web crawling for efficient content gathering
- Intelligent URL filtering and domain-specific content extraction
- Automated content chunking and optimization
- Vector database storage for efficient retrieval
- Local LLM processing using Ollama
- Context-aware query processing
- Source-cited responses
- Conversation memory management
- Interactive Q&A interface
- Database inspection and management tools
- Configurable crawling parameters
- Command history navigation
- Automatic cleanup and session management
- Python 3.8 or higher
- Ollama installed and running locally
- Git for repository cloning
- Virtual environment (recommended)
# Clone the repository
git clone https://github.com/AIAfterDark/AI-URL-Read.git
cd url-reader
# Create and activate virtual environment
python -m venv venv
source venv/bin/activate # On Windows use: venv\Scripts\activate
# Install required packages
pip install -r requirements.txt
# Ensure Ollama is running and pull required model
ollama pull llama3.2
Create a requirements.txt file containing:
langchain
langchain-community
langchain-ollama
beautifulsoup4
requests
chromadb
colorama
The most straightforward way to use the URL Reader Assistant is:
python url-read.py https://example.com
For more control over the processing:
python url-read.py https://example.com \
--model llama3.2 \
--max-pages 5 \
--verbose \
--save-db \
--memory-size 5
Argument | Description | Default |
---|---|---|
url | Target URL to analyze (required) | None |
--model | Ollama model name | llama3.2 |
--max-pages | Maximum pages to crawl | 50 |
--verbose | Enable detailed logging | False |
--save-db | Save database snapshot | False |
--memory-size | Recent interactions to remember | 5 |
During the Q&A session, the following commands are available:
/quit
- Exit the application/db info
- Display database information/db inspect <id>
- Inspect specific document chunk/db save [filename]
- Save database snapshot
Use arrow keys (↑↓) for command history navigation.
$ python url-read.py https://docs.example.com
Database Information:
Total documents: 25
Database path: ./chroma_db
Article Overview:
[Generated content overview]
Documentation processed! Enter your questions (/quit to exit)
Question: What are the main features?
Answer:
[AI-generated response]
Sources:
- Documentation Home
docs.example.com/home
- Features Page
docs.example.com/features
- Start with a small number of pages for initial testing
- Enable verbose mode when debugging issues
- Use database snapshots for important content
- Verify source citations in responses
- Choose appropriate models based on content type
- Consider memory requirements for larger sites
- Balance between speed and accuracy needs
- Ask specific, focused questions
- Utilize conversation context for follow-ups
- Review source citations for verification
- Currently supports HTML content only
- Single domain processing per session
- Requires active Ollama installation
- Memory usage scales with content size
-
Database Connection Errors
- Verify ChromaDB installation
- Check directory permissions
- Ensure sufficient disk space
-
Ollama Connection Issues
- Confirm Ollama is running
- Verify model availability
- Check network connectivity
-
Memory Problems
- Reduce max pages parameter
- Adjust chunk sizes
- Increase available system memory
We welcome contributions to the URL Reader Assistant project. Please follow these steps:
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to your branch
- Create a Pull Request
For detailed contribution guidelines, see CONTRIBUTING.md.
This project is licensed under the MIT License - see the LICENSE file for details.
- LangChain team for the fundamental framework
- Ollama project for local LLM capabilities
- ChromaDB for vector storage solutions
For issues and feature requests, please use the GitHub issue tracker.