
The Complete AI Development Framework - From Local Prototypes to Production Systems
🚀 Quick Start • 📚 Documentation • 🏗️ Architecture • 🤝 Contributing
LlamaFarm is a comprehensive, modular framework for building AI Projects that run locally, collaborate, and deploy anywhere. We provide battle-tested components for RAG systems, vector databases, model management, prompt engineering, and soon fine-tuning - all designed to work seamlessly together or independently.
LlamaFarm is a comprehensive, modular AI framework that gives you complete control over your AI stack. Unlike cloud-only solutions, we provide:
- 🏠 Local-First Development - Build and test entirely on your machine
- 🔧 Production-Ready Components - Battle-tested modules that scale from laptop to cluster
- 🎯 Strategy-Based Configuration - Smart defaults with infinite customization
- 🚀 Deploy Anywhere - Same code runs locally, on-premise, or in any cloud
- Developers who want to build AI applications without vendor lock-in
- Teams needing cost control and data privacy
- Enterprises requiring scalable, secure AI infrastructure
- Researchers experimenting with cutting-edge techniques
LlamaFarm is built as a modular system where each component can be used independently or orchestrated together for powerful AI applications.
The execution environment that orchestrates all components and manages the application lifecycle.
- Process Management: Handles component initialization and shutdown
- Resource Allocation: Manages memory, CPU, and GPU resources efficiently
- Service Discovery: Automatically finds and connects components
- Health Monitoring: Tracks component status and performance metrics
- Error Recovery: Automatic restart and fallback mechanisms
Zero-configuration deployment system that works from local development to production clusters.
- Environment Detection: Automatically adapts to local, Docker, or cloud environments
- Configuration Management: Handles environment variables and secrets securely
- Scaling: Horizontal and vertical scaling based on load
- Load Balancing: Distributes requests across multiple instances
- Rolling Updates: Zero-downtime deployments with automatic rollback
Complete document processing and retrieval system for building knowledge-augmented applications.
- Document Ingestion: Parse 15+ formats (PDF, Word, Excel, HTML, Markdown, etc.)
- Smart Extraction: Extract entities, keywords, statistics without LLMs
- Vector Storage: Integration with 8+ vector databases (Chroma, Pinecone, FAISS, etc.)
- Hybrid Search: Combine semantic, keyword, and metadata-based retrieval
- Chunking Strategies: Adaptive chunking based on document type and use case
- Incremental Updates: Efficiently update knowledge base without full reprocessing
Unified interface for all LLM operations with enterprise-grade features.
- Multi-Provider Support: 25+ providers (OpenAI, Anthropic, Google, Ollama, etc.)
- Automatic Failover: Seamless fallback between providers when errors occur
- Fine-Tuning Pipeline: Train custom models on your data (Coming Q2 2025)
- Cost Optimization: Route queries to cheapest capable model
- Load Balancing: Distribute across multiple API keys and endpoints
- Response Caching: Intelligent caching to reduce API costs
- Model Configuration: Per-model temperature, token limits, and parameters
Enterprise prompt management system with version control and A/B testing.
- Template Library: 20+ pre-built templates for common use cases
- Dynamic Variables: Jinja2 templating with type validation
- Strategy Selection: Automatically choose best template based on context
- Version Control: Track prompt changes and performance over time
- A/B Testing: Compare prompt variations with built-in analytics
- Chain-of-Thought: Built-in support for reasoning chains
- Multi-Agent: Coordinate multiple specialized prompts
- User Request → Runtime receives and validates the request
- Context Retrieval → Data Pipeline searches relevant documents
- Prompt Selection → Prompts system chooses optimal template
- Model Execution → Models component handles LLM interaction with automatic failover
- Response Delivery → Runtime returns formatted response to user
Each component is independent but designed to work seamlessly together through standardized interfaces.
# Quick install with our script
curl -fsSL https://raw.githubusercontent.com/llama-farm/llamafarm/main/install.sh | bash
# Or clone and set up manually
git clone https://github.com/llama-farm/llamafarm.git
cd llamafarm
Each component can be used independently or together:
# 1. RAG System - Document Processing & Retrieval
cd rag
uv sync
uv run python setup_demo.py # Interactive setup wizard
# 2. Models - LLM Management
cd ../models
uv sync
uv run python demos/demo_fallback.py # See fallback in action
# 3. Prompts - Template System
cd ../prompts
uv sync
uv run python -m prompts.cli template list # View available templates
# Ingest documents with smart extraction
uv run python rag/cli.py ingest samples/ \
--extractors keywords entities statistics \
--strategy research
# Search with advanced retrieval
uv run python rag/cli.py search \
"What are the key findings about climate change?" \
--top-k 5 --rerank
# Chat with automatic fallback
uv run python models/cli.py chat \
--primary gpt-4 \
--fallback claude-3 \
--local-fallback llama3.2 \
"Explain quantum entanglement"
# Use domain-specific templates
uv run python prompts/cli.py execute \
"Analyze this medical report for anomalies" \
--strategy medical \
--template diagnostic_analysis
LlamaFarm uses a strategy-based configuration system that adapts to your use case:
# config/strategies.yaml
strategies:
research:
rag:
embedder: "sentence-transformers"
chunk_size: 512
overlap: 50
retrievers:
- type: "hybrid"
weights: {dense: 0.7, sparse: 0.3}
models:
primary: "gpt-4"
fallback: "claude-3-opus"
temperature: 0.3
prompts:
template: "academic_research"
style: "formal"
citations: true
customer_support:
rag:
embedder: "openai"
chunk_size: 256
retrievers:
- type: "similarity"
top_k: 3
models:
primary: "gpt-3.5-turbo"
temperature: 0.7
prompts:
template: "conversational"
style: "friendly"
include_context: true
# Apply strategy across all components
export LLAMAFARM_STRATEGY=research
# Or specify per command
uv run python rag/cli.py ingest docs/ --strategy research
uv run python models/cli.py chat --strategy customer_support "Help me with my order"
Component | Description | Documentation |
---|---|---|
RAG System | Document processing, embedding, retrieval | 📚 RAG Guide |
Models | LLM providers, management, optimization | 🤖 Models Guide |
Prompts | Templates, strategies, evaluation | 📝 Prompts Guide |
CLI | Command-line tools and utilities | ⚡ CLI Reference |
API | REST API services | 🔌 API Docs |
- Building Your First RAG Application
- Setting Up Local Models with Ollama
- Advanced Prompt Engineering
- Deploying to Production
- Cost Optimization Strategies
Check out our examples/ directory for complete working applications:
- 📚 Knowledge Base Assistant
- 💬 Customer Support Bot
- 📊 Document Analysis Pipeline
- 🔍 Semantic Search Engine
- 🤖 Multi-Agent System
# Run with hot-reload
uv run python main.py --dev
# Or use Docker
docker-compose up -d
# docker-compose.prod.yml
version: '3.8'
services:
llamafarm:
image: llamafarm/llamafarm:latest
environment:
- STRATEGY=production
- WORKERS=4
volumes:
- ./config:/app/config
- ./data:/app/data
ports:
- "8000:8000"
deploy:
replicas: 3
resources:
limits:
memory: 4G
- AWS: ECS, Lambda, SageMaker
- GCP: Cloud Run, Vertex AI
- Azure: Container Instances, ML Studio
- Self-Hosted: Kubernetes, Docker Swarm
See deployment guide for detailed instructions.
from llamafarm import Pipeline, RAG, Models, Prompts
# Create a complete AI pipeline
pipeline = Pipeline(strategy="research")
.add(RAG.ingest("documents/"))
.add(Prompts.select_template())
.add(Models.generate())
.add(RAG.store_results())
# Execute with monitoring
results = pipeline.run(
query="What are the implications?",
monitor=True,
cache=True
)
from llamafarm.strategies import Strategy
class MedicalStrategy(Strategy):
"""Custom strategy for medical document analysis"""
def configure_rag(self):
return {
"extractors": ["medical_entities", "dosages", "symptoms"],
"embedder": "biobert",
"chunk_size": 256
}
def configure_models(self):
return {
"primary": "med-palm-2",
"temperature": 0.1,
"require_citations": True
}
from llamafarm.monitoring import Monitor
monitor = Monitor()
monitor.track_usage()
monitor.analyze_costs()
monitor.export_metrics("prometheus")
We welcome contributions! See our Contributing Guide for:
- 🐛 Reporting bugs
- 💡 Suggesting features
- 🔧 Submitting PRs
- 📚 Improving docs
Bobby Radford 💻 |
Matt Hamann 💻 |
Rob Thelen 💻 |
Davon Davis 💻 |
Racheal Ochalek 💻 |
rachradulo 💻 |
- Vector DBs: ChromaDB, Pinecone, Weaviate, Qdrant, FAISS
- LLM Providers: OpenAI, Anthropic, Google, Cohere, Together, Groq
- Deployment: Docker, Kubernetes, AWS, GCP, Azure
- Monitoring: Prometheus, Grafana, DataDog, New Relic
- RAG System with 10+ parsers and 5+ extractors
- 25+ LLM provider integrations
- 20+ prompt templates with strategies
- CLI tools for all components
- Docker deployment support
- Fine-tuning pipeline (Looking for contributors with ML experience)
- Advanced caching system (Redis/Memcached integration - 40% complete)
- GraphRAG implementation (Design phase - Join discussion)
- Multi-modal support (Vision models integration - Early prototype)
- Agent orchestration (LangGraph integration planned)
- AutoML for strategy optimization (Q3 2025 - Seeking ML engineers)
- Distributed training (Q4 2025 - Partnership opportunities welcome)
- Edge deployment (Q2 2025 - IoT and mobile focus)
- Mobile SDKs (iOS/Android - Looking for mobile developers)
- Web UI dashboard (Q2 2025 - React/Vue developers needed)
We're actively looking for contributors in these areas:
- 🧠 Machine Learning: Fine-tuning, distributed training
- 📱 Mobile Development: iOS/Android SDKs
- 🎨 Frontend: Web UI dashboard
- 🔍 Search: GraphRAG and advanced retrieval
- 📚 Documentation: Tutorials and examples
See our public roadmap for details.
LlamaFarm is MIT licensed. See LICENSE for details.
LlamaFarm stands on the shoulders of giants:
- 🦜 LangChain - LLM orchestration inspiration
- 🤗 Transformers - Model implementations
- 🎯 ChromaDB - Vector database excellence
- 🚀 uv - Lightning-fast package management
See CREDITS.md for complete acknowledgments.
Join thousands of developers building with LlamaFarm
⭐ Star on GitHub • 💬 Join Discord • 📚 Read Docs • 🐦 Follow Updates
Build locally. Deploy anywhere. Own your AI.