A powerful Retrieval-Augmented Generation (RAG) API built with FastAPI, ChromaDB, and async web scraping capabilities. This system allows you to build a knowledge base from web content and query it using natural language.
- FastAPI-based RESTful API with interactive Swagger documentation
- ChromaDB vector database for efficient similarity search
- Async web scraping with DuckDuckGo search integration
- Direct URL crawling for specific content ingestion
- Sentence Transformers for high-quality embeddings
- Flan-T5 model for text generation
- Apple Silicon GPU acceleration (MPS support)
- Python 3.8+
- macOS with Apple Silicon (for MPS acceleration) or any system with CPU support
-
Clone the repository
git clone https://github.com/AnanyaBanerjee01/rag-web-api.git
-
Create a virtual environment
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies
pip install chromadb sentence-transformers transformers torch tqdm duckduckgo-search beautifulsoup4 aiohttp fastapi uvicorn
-
Start the API server
python rag_chroma_api.py
-
Access the interactive documentation Open your browser and navigate to: http://127.0.0.1:8000/docs
-
Test the health check
curl http://127.0.0.1:8000/
- GET
/- Check if the API is running - Response:
{"status": "healthy", "message": "RAG API is running"}
- GET
/ask?query=your_question- Ask questions to the RAG system - Parameters:
query(required): Your question as a string
- Example:
curl "http://127.0.0.1:8000/ask?query=What%20is%20machine%20learning?"
- POST
/refresh_web- Search the web and add content to knowledge base - Body:
{ "query": "machine learning basics", "max_results": 5 } - Example:
curl -X POST "http://127.0.0.1:8000/refresh_web" \ -H "Content-Type: application/json" \ -d '{"query": "artificial intelligence", "max_results": 3}'
- POST
/crawl- Crawl specific URLs and add to knowledge base - Body:
{ "urls": [ "https://en.wikipedia.org/wiki/Machine_learning", "https://example.com/article" ], "descriptions": [ "Machine learning overview", "Article description" ] } - Example:
curl -X POST "http://127.0.0.1:8000/crawl" \ -H "Content-Type: application/json" \ -d '{ "urls": ["https://en.wikipedia.org/wiki/Data_science"], "descriptions": ["Data science overview"] }'
The system uses the following models by default:
- Embedding Model:
all-MiniLM-L6-v2 - Generation Model:
google/flan-t5-base
- ChromaDB stores vectors in
./chromadb_api/directory - Collection Name:
rag_api
You can modify the RAGChroma class initialization in rag_chroma_api.py:
rag = RAGChroma(
collection_name="your_collection",
persist_directory="./your_db_path",
embedding_model_name="your-embedding-model",
generation_model="your-generation-model"
)RAG/
โโโ rag_chroma_api.py # Main FastAPI application
โโโ web_ingestor.py # Async web scraping utilities
โโโ chromadb_api/ # ChromaDB database files
โ โโโ chroma.sqlite3
โโโ __pycache__/ # Python cache files
โโโ venv/ # Virtual environment
โโโ README.md # This file
# Ask a simple question
curl "http://127.0.0.1:8000/ask?query=What%20is%20Python?"# First, add some content about Python
curl -X POST "http://127.0.0.1:8000/refresh_web" \
-H "Content-Type: application/json" \
-d '{"query": "Python programming language", "max_results": 3}'
# Then ask a question about Python
curl "http://127.0.0.1:8000/ask?query=What%20are%20Python%27s%20main%20features?"# Add specific documentation
curl -X POST "http://127.0.0.1:8000/crawl" \
-H "Content-Type: application/json" \
-d '{
"urls": [
"https://docs.python.org/3/tutorial/introduction.html",
"https://realpython.com/python-basics/"
],
"descriptions": [
"Python official tutorial",
"Python basics guide"
]
}'- Navigate to http://127.0.0.1:8000/docs
- Expand any endpoint (e.g.,
/ask) - Click "Try it out"
- Enter your parameters
- Click "Execute"
- View the response
{
"query": "What is machine learning?",
"answer": "Machine learning is a subset of artificial intelligence that enables computers to learn and make decisions from data without being explicitly programmed for every task."
}{
"message": "Successfully crawled and ingested 2 documents.",
"urls_processed": 2,
"docs_ingested": 2
}{
"error": "Error message here",
"message": "Failed to generate answer"
}-
Connection Refused Error
- Ensure the server is running:
python rag_chroma_api.py - Check if port 8000 is available
- Ensure the server is running:
-
Model Loading Issues
- Ensure you have sufficient RAM (models require ~2-4GB)
- Check internet connection for initial model downloads
-
Web Scraping Failures
- Some websites block automated requests
- Check the server logs for detailed error messages
-
Empty Responses
- Add content to the knowledge base first using
/refresh_webor/crawl - Ensure your query matches the ingested content topics
- Add content to the knowledge base first using
- Initial Setup: First run will download models (~1-2GB)
- Memory Usage: System uses ~3-4GB RAM when fully loaded
- Response Time: First query takes longer due to model initialization
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- FastAPI: Web framework for building APIs
- ChromaDB: Vector database for embeddings
- Sentence Transformers: For generating text embeddings
- Transformers: Hugging Face transformers library
- BeautifulSoup4: HTML parsing for web scraping
- aiohttp: Async HTTP client
- DuckDuckGo Search: Web search functionality
If you encounter any issues or have questions:
- Check the troubleshooting section above
- Review the server logs for detailed error messages
- Open an issue on GitHub with detailed information about your problem
- Support for additional embedding models
- PDF document ingestion
- User authentication and rate limiting
- Batch processing endpoints
- Advanced search filters
- Export/import knowledge base functionality