A lightning-fast, locally-running RAG (Retrieval-Augmented Generation) implementation using the Agno AI framework. The agent leverages Ollama (Llama 3.2 3b) combining structured knowledge from financial reports (PDF Url's) with real-time web searches (duckduckgo). This project demonstrates how to build an efficient question-answering system for PDFs using local models and embeddings.
- Lightweight Implementation: Built with Agno AI framework requiring minimal code
- Local Model Support: Uses Llama 3.2 (3B parameters) for inference (toolcalling)
- Local Embeddings: Implements Nomic embeddings for document processing
- PDF Processing: Direct URL-to-PDF processing capability
- High Performance: Optimized for speed and efficiency
- Easy Model Switching: Flexible architecture supporting both open and closed-source models
- Framework: Agno AI
- Language Model: Llama 3.2 (3B)
- Embeddings: Nomic
- Tested Hardware: MacBook M1 2021, 8GB RAM
- Model Setup: Running Ollama with Llama 3.2 3B
- Model Performance: Llama 3.2 3B demonstrated superior performance compared to other tested models in terms of speed and response quality
- PDF Url: US Economics Analyst 2025
- Clone the repository
- Install dependencies
- Configure your local models
- Run the application
The implementation shows significant speed improvements compared to traditional RAG implementations, particularly in:
- Document processing time
- Query response latency
- Memory efficiency
Special thanks to:
- Ashpreet Bedi for introducing the Agno AI framework
- The Agno AI team for their excellent documentation and support
This is a side project created for educational purposes and to contribute to the developer community. Feel free to use, modify, and share!
MIT