A powerful agentic chatbot system built with FastAPI, LangGraph, and Anthropic Claude that provides an intelligent AI assistant capable of web search, Wikipedia queries, and secure code execution.
See the AI by Design Agent in action:
Click the thumbnail above to watch a full demonstration of the agent's capabilities, including web search, code execution, and multi-tool workflows.
🤖 Advanced AI Chat Interface
- Real-time streaming responses via WebSocket
- Enhanced Thinking: Claude's internal reasoning for improved response quality
- Interleaved Thinking: Better tool orchestration and multi-step workflows
- Intelligent text formatting with proper sentence spacing
- Markdown support: Automatic parsing of headers, bold, italic, and clickable hyperlinks
- Responsive design with REM-based CSS
🔧 Multi-Tool Integration via Enhanced MCP
- Web Search: Tavily API integration for current information
- Wikipedia Access: Comprehensive knowledge base queries
- Code Execution: Secure Python environment with mathematical libraries
- DateTime Tools: Automatic current date retrieval for time-sensitive queries
- Large Number Handling: Stirling's approximation for factorial calculations
- File Upload: Support for images and PDFs with vision analysis
- Vector Database: PostgreSQL + pgvector for enhanced multimodal memory
- MCP Architecture: Model Context Protocol with multiple server sessions
🛡️ Smart Content Filtering
- Prevents raw tool output from displaying to users
- Filters out "[object Object]" and JSON-like responses
- Conservative validation to maintain response quality
🔄 Advanced Error Recovery
- Circuit breaker pattern with exponential backoff
- Intelligent retry logic for API failures
- Real-time error recovery monitoring
- Automatic failure trend analysis
📊 Intelligent Caching System
- In-memory cache with TTL support
- API call reduction (60-80% efficiency)
- Real-time cache performance monitoring
- Automatic LRU eviction
🧠 Long-term Agentic Memory
- Semantic Memory: Facts, preferences, skills, and domain knowledge
- Episodic Memory: Conversation summaries with context and outcomes
- Procedural Memory: Learned patterns and successful interaction sequences
- OpenAI Embeddings: Semantic search for memory retrieval
- Persistent Storage: Local JSON-based memory with automatic pruning
🔍 Comprehensive Monitoring
- Real-time system health dashboard
- Cache hit rate and performance metrics
- Error recovery statistics and trends
- Detailed logging with automatic rotation
💻 Modern Architecture
- FastAPI backend with WebSocket support
- LangGraph for workflow orchestration
- Anthropic Claude 4 Sonnet with enhanced thinking capabilities
- Enhanced MCP: Multiple server sessions with tool-to-session mapping
- Persistent conversation memory with vector embeddings
- Python 3.11+
- Anthropic API key
- Tavily API key
-
Clone the repository
git clone https://github.com/scarnyc/agentic-workflow.git cd agentic-workflow
-
Install dependencies
pip install -r requirements.txt
-
Install multimodal dependencies (optional) For enhanced image embeddings with CLIP:
pip install torch torchvision transformers
Note: System works with text-only embeddings if these are not installed.
-
Set up environment variables Create a
.env
file in the project root:ANTHROPIC_API_KEY=your_anthropic_api_key_here TAVILY_API_KEY=your_tavily_api_key_here OPENAI_API_KEY=your_openai_api_key_here # Optional: For long-term memory DATABASE_URL=postgresql://username:password@localhost:5432/agentic_vectors # Optional: For PostgreSQL vector storage PINECONE_API_KEY=your_pinecone_api_key_here # Alternative: For cloud vector storage
-
Run the application
python main.py
-
Optional: Vector Database Setup Choose one of the following for enhanced multimodal memory:
Option A: PostgreSQL (Recommended - Cost-effective)
# Follow detailed setup instructions cat database/README.md # Quick setup psql -U postgres -f database/setup_postgres.sql
Option B: Pinecone (Cloud)
# Just set your API key in .env PINECONE_API_KEY=your_pinecone_api_key_here
Option C: Mock Database (Default) No setup required - automatically used if neither above is configured.
-
Open your browser Navigate to
http://localhost:8000
This is an agentic workflow system built with FastAPI, LangGraph, and Anthropic Claude that provides intelligent tool orchestration via Enhanced MCP (Model Context Protocol) with advanced error recovery and caching.
agentic-workflow/
├── main.py # FastAPI server with WebSocket endpoints
├── core/ # Core system components
│ ├── app.py # LangGraph workflow with MCP integration
│ ├── cache.py # In-memory cache with TTL support
│ ├── error_recovery.py # Circuit breaker pattern & error handling
│ ├── logging_config.py # Comprehensive logging system
│ ├── cache_monitor.py # Real-time cache monitoring utility
│ ├── error_recovery_monitor.py # Error recovery monitoring & trends
│ ├── long_term_memory.py # OpenAI embeddings-based memory store
│ ├── memory_agent.py # Memory-enhanced agent with extraction
│ ├── postgres_vector_db.py # PostgreSQL vector database implementation
│ ├── vector_db_factory.py # Auto-detection of available databases
│ └── mock_vector_db.py # Fallback mock database
├── mcp/ # Enhanced MCP implementation
│ ├── enhanced_mcp_tools.py # Multi-server MCP client with session management
│ ├── mcp_config.json # Server configuration and tool mapping
│ ├── mcp_servers/ # Individual MCP server implementations
│ │ ├── code_server.py # Python execution & mathematical tools
│ │ ├── search_server.py # Tavily web search capabilities
│ │ ├── wiki_server.py # Wikipedia search functionality
│ │ ├── datetime_server.py # Time-sensitive date/time tools
│ │ └── multimodal_server.py # Vector database & multimodal operations
│ └── MCP_IMPLEMENTATION.md # Detailed MCP architecture documentation
├── database/ # Database setup and migrations
│ ├── setup_postgres.sql # PostgreSQL + pgvector setup script
│ └── README.md # Database setup instructions
├── tools/ # Tool implementations (used by MCP servers)
│ ├── secure_executor.py # Secure Python execution with sandboxing
│ ├── search_tools.py # Tavily web search integration
│ ├── wiki_tools.py # Wikipedia API wrapper
│ ├── datetime_tools.py # Current date/time for context
│ ├── math_tools.py # Mathematical calculations
│ ├── secure_executor.py # Sandboxed execution environment
│ └── prompt.py # System prompts and guidelines
├── test/ # Testing infrastructure
│ ├── test_api_errors.py # Automated API error testing
│ └── TESTING_GUIDE.md # Comprehensive testing guide
├── static/ # Frontend assets
│ ├── css/styles.css # Responsive styling
│ └── js/app.js # WebSocket client logic
├── templates/
│ └── index.html # Main chat interface
├── logs/ # Application logs (auto-created)
│ ├── app.log # General application logs
│ ├── error.log # Error-level logs
│ ├── cache.log # Cache operations
│ ├── error_recovery.log # Error recovery events
│ ├── websocket.log # WebSocket connections
│ └── api_calls.log # API tool usage
└── memory/ # Long-term memory storage (auto-created)
├── semantic_memories.json # Facts, preferences, skills
├── episodic_memories.json # Conversation summaries
└── procedural_memories.json # Learned patterns
The system uses Model Context Protocol (MCP) with multiple server sessions for robust tool orchestration:
- Multiple Client Sessions: Each tool category runs in its own MCP server process
- Tool-to-Session Mapping: Efficient routing of tool calls to appropriate servers
- Resource Management: Proper cleanup with ExitStack context manager
- Modular Design: Easy to extend with new servers and tools
- code-server: Python execution and mathematical computations
- search-server: Web search via Tavily API with caching
- wiki-server: Wikipedia search with intelligent content processing
- datetime-server: Current date/time for time-sensitive queries
- multimodal-server: Vector database operations and multimodal memory
- Scalability: Each server runs independently, no single point of failure
- Maintainability: Clear separation of concerns between tool categories
- Performance: Direct tool-to-session mapping for fast routing
- Future-Ready: Prepared for remote MCP server deployment
- User Input → WebSocket connection established
- Memory Retrieval → Semantic search for relevant context from vector database
- Message Processing → LangGraph workflow orchestration with memory context
- MCP Tool Routing → Tool calls routed to appropriate MCP server sessions
- Tool Execution → Parallel execution across specialized MCP servers
- Response Streaming → Real-time chunks via WebSocket
- Content Filtering → Intelligent formatting and validation
- Memory Extraction → Automatic memory processing and vector storage
- UI Display → Responsive message bubbles with proper spacing
GET /
- Main chat interfacePOST /api/conversations
- Create new conversationGET /api/health
- System health check with cache and error recovery statsGET /api/cache/stats
- Cache performance statisticsPOST /api/cache/clear
- Clear all cache entriesGET /api/error-recovery/stats
- Error recovery and circuit breaker statusGET /api/memory/stats
- Long-term memory statisticsPOST /api/memory/process/{conversation_id}
- Process conversation for memory extraction
WS /ws/{conversation_id}
- Real-time chat communication
Client to Server:
{
"type": "message",
"content": "What's the weather today?",
"id": "message-123"
}
Server to Client:
{
"type": "message_chunk",
"content": "The weather today is..."
}
All tools are accessed through the Enhanced MCP (Model Context Protocol) architecture, providing robust session management and efficient routing.
- Current events and news
- Real-time information
- Market data and trends
- Product information
- Caching: 30-minute TTL for search results
- Processing: Token-optimized result formatting
Example: "What are the latest developments in AI?"
- Historical information
- Biographical data
- Scientific concepts
- General knowledge
- Security: URL encoding, input validation, query sanitization
- Caching: 24-hour TTL for stable content
Example: "Tell me about the Roman Empire"
- Mathematical calculations
- Data analysis
- Algorithm implementation
- Scientific computing with mpmath
- Security: Sandboxed execution environment
- Features: Stirling approximation for large factorials
Example: "Calculate the factorial of 100"
- Automatic current date retrieval for time-sensitive queries
- Resolves relative time references ("this week", "next week", "recently")
- Eliminates confusion from model knowledge cutoff
- Contextualizes search queries with accurate timeframes
- Tools: Current datetime, simple date format for search context
Example: "What's the weather next week in Miami?" automatically gets current date, calculates "next week", then searches with proper date context.
- Vector database operations (PostgreSQL/Pinecone/Mock)
- Text and image memory storage
- Semantic similarity search
- Database auto-detection and health monitoring
- Features: Store/search text, store/analyze images, database info
Example: Store important facts, search previous conversations, analyze uploaded images
The agent employs a sophisticated three-tier memory system using OpenAI embeddings for semantic search and retrieval:
📝 Semantic Memory
- Stores factual knowledge, user preferences, and skills
- Automatically extracts information from user statements
- Categories: facts, preferences, skills, domain knowledge
- Example: "I prefer Python programming" → stored as preference
📚 Episodic Memory
- Records conversation summaries with context
- Tracks tools used, outcomes, and emotional context
- Importance scoring for memory retention
- Example: "User asked about data science, used search tool, successful outcome"
⚙️ Procedural Memory
- Learns successful interaction patterns
- Stores trigger conditions → action sequences
- Success rate tracking and pattern optimization
- Example: "Code request → analyze requirements → generate code → explain"
📁 memory/
├── semantic_memories.json # Facts, preferences, skills
├── episodic_memories.json # Conversation summaries
└── procedural_memories.json # Learned patterns
Each memory includes:
- Content: The actual memory information
- Embedding: 1536-dimensional OpenAI vector for semantic search
- Metadata: Confidence scores, timestamps, usage counts
- Context: Category, source, importance scores
- Context Retrieval: Every user message triggers semantic search
- Enhanced Prompts: Relevant memories automatically added to system prompts
- Automatic Extraction: Conversations processed for memory on disconnect
- Smart Pruning: LRU-based memory management with configurable limits
- Stirling's approximation for large factorials
- Scientific notation formatting
- High-precision calculations
- Memory-efficient algorithms
- Headers:
# Title
,## Subtitle
,### Section
(supports H1-H6) - Bold text:
**text**
renders as bold - Italic text:
*text*
renders as italic - Clickable links: Automatic URL detection and formatting
- Smart parsing: Real-time markdown processing during streaming
- Custom styling: Light blue links and purple headers optimized for dark theme
- Internal Reasoning: Claude processes complex problems with enhanced thinking
- Better Tool Selection: Improved reasoning about which tools to use
- Quality Improvements: All responses benefit from internal reasoning processes
- Interleaved Thinking: Enhanced tool orchestration for multi-step workflows
- Note: Thinking content processed internally but not displayed due to LangChain limitations
Variable | Description | Required |
---|---|---|
ANTHROPIC_API_KEY |
Anthropic Claude API key | Yes |
TAVILY_API_KEY |
Tavily search API key | Yes |
OPENAI_API_KEY |
OpenAI API key for embeddings | No* |
*Required for long-term memory functionality
The system uses Claude 4 Sonnet with:
- Max tokens: 2,000
- Enhanced thinking: 1,024 token budget for internal reasoning
- Interleaved thinking: Beta feature for better tool orchestration
- Tool binding: All available tools
- Memory: Persistent conversation history
# Real-time cache monitoring
python core/cache_monitor.py --monitor
# Error recovery monitoring
python core/error_recovery_monitor.py --monitor
# System health check
python core/cache_monitor.py --health
python core/error_recovery_monitor.py --health
# Analyze error trends
python core/error_recovery_monitor.py --trends
# View cache statistics
python core/cache_monitor.py
curl http://localhost:8000/api/cache/stats
# Clear cache
python core/cache_monitor.py --clear
curl -X POST http://localhost:8000/api/cache/clear
# Run cache benchmark
python core/cache_monitor.py --benchmark
# View memory statistics
curl http://localhost:8000/api/memory/stats
# Process conversation for memory extraction
curl -X POST http://localhost:8000/api/memory/process/{conversation_id}
# Test memory system
python test_memory.py
# Memory storage location
ls -la memory/
# Run automated API error tests
python test_api_errors.py
# Test memory system
python test_memory.py
# Test extended thinking functionality
python test_thinking.py
# View comprehensive testing guide
cat TESTING_GUIDE.md
# Monitor datetime tool usage
grep "datetime" logs/api_calls.log
# Watch time-sensitive query handling in real-time
tail -f logs/api_calls.log | grep "current date"
# Check for time-context searches
grep "Retrieved.*date" logs/api_calls.log
# View Wikipedia tool security analysis
cat WIKIPEDIA_SECURITY_ANALYSIS.md
# Check tool security implementations
grep -r "quote\|sanitize\|validate" tools/
# Test MCP client functionality
python -c "from mcp import get_enhanced_mcp_tools; tools = get_enhanced_mcp_tools(); print(f'Loaded {len(tools)} tools')"
# View MCP configuration
cat mcp/mcp_config.json
# View detailed MCP documentation
cat mcp/MCP_IMPLEMENTATION.md
# Test individual MCP server
python mcp/mcp_servers/datetime_server.py
- Create server file in
mcp/mcp_servers/new_server.py
- Update configuration in
mcp/mcp_config.json
- Add tool definitions in
mcp/enhanced_mcp_tools.py
- Test integration with the main app
from mcp.server.fastmcp import FastMCP
mcp = FastMCP("Your Server Name")
@mcp.tool()
def your_tool(param: str) -> str:
"""Tool description"""
return f"Result: {param}"
if __name__ == "__main__":
mcp.run()
# Test PostgreSQL setup
python -c "from core.vector_db_factory import VectorDBFactory; db = VectorDBFactory.create_vector_db(); print(db.get_stats())"
# View database setup instructions
cat database/README.md
# Check which database is being used
python -c "from core.vector_db_factory import VectorDBFactory; print(VectorDBFactory.get_available_databases())"
# Format code
black .
# Run tests
pytest
# Type checking (if configured)
mypy .
- Create a new tool file in
tools/
- Implement the tool function
- Add to
tools/secure_executor.py
or create new category - Update
core/app.py
to include the tool - Add usage guidelines to
tools/prompt.py
The UI uses CSS custom properties for easy theming:
:root {
--bg-dark: #18191a;
--bg-message: #292a2d;
--accent: #7c4dff;
--text-light: #e4e6eb;
}
- Sandboxed Python environment
- Temporary file cleanup
- Resource limitations
- Error handling and logging
- Input sanitization
- Output filtering
- Raw data detection
- Malicious content prevention
- CORS configuration
- WebSocket authentication
- API key protection
- Rate limiting (Anthropic-enforced)
- Wikipedia Tool: URL encoding, input validation, query length limiting
- Search Tool: API key protection, result filtering
- Code Tool: Sandboxed execution, no file system access
- Security Auditing: Regular vulnerability assessments of LangChain community tools
- Chunked delivery: Real-time message streaming
- Intelligent spacing: Sentence boundary detection
- Content filtering: Prevents UI blocking on raw data
- Auto-scrolling: Smooth user experience
- Conversation persistence: In-memory storage with cleanup
- Tool result caching: Reduced API calls
- Connection pooling: Efficient WebSocket handling
- Stirling's approximation: For large factorial calculations
- Scientific notation: Prevents UI overflow
- Precision control: Balanced accuracy and performance
This project is licensed under the MIT License - see the LICENSE file for details.
- Anthropic for Claude API and advanced reasoning capabilities
- LangChain for framework and tool integration
- Tavily for web search functionality
- FastAPI for modern web framework
- Community for inspiration and feedback
- Comprehensive error handling
- Handling stop reasons
- Caching Results: Add a simple cache for commonly requested information to reduce API calls
- Progressive Enhancement: In the frontend, show typing indicators during tool transitions for a more natural feel
- Error Recovery: Implement automatic retries for temporary API failures
- Long-term Agentic Memory (Semantic, Episodic, Procedural)
- OpenAI Embeddings for semantic search
- Automatic memory extraction and retrieval
- Vision, PDF support ✅
- Canvas
- Log-in screen with Google oAuth for sign-in
- MCP Servers ✅
- Support for GPT-4o for writing via MCP ✅
- File System
- Human in the loop (stop and ask for input)
- Evals (https://docs.anthropic.com/en/docs/test-and-evaluate/strengthen-guardrails/reduce-hallucinations)
- RL fine-tuning with GRPO w/ thumbs up and thumbs down user feedback
- Persist user Chat history (UI)
- Planning: research, generation, reflection
- RAG, Deep Research w/ Perplexity
- Upgraded web search with Google SerpAPI
- Enable Claude's Built-in Web Search w/ Prompt Caching
- Claude's Code Exec / Prompt Gen / Computer Use (Beta)
- Experiment with thinking budget
- Slack, LinkedIn, gmail, Nasa toolkit, Substack
- User-input OpenAI / Anthropic API Key
- Security with Cloudflare
- App optimized for security, speed & efficiency
- Generative UI
- User Feedback Loop: Add a thumbs up/down mechanism to collect feedback on answers
- chatterbox.ai voice integration
Built with ❤️ for intelligent automation