A FastAPI-based service that analyzes log files using LangGraph and AI models (Gemini and Groq). Designed for easy deployment to Google Cloud Run.
- 🔍 Intelligent Log Analysis: Identifies errors, warnings, and patterns in log files
- 🚀 Fast Processing: Optimized for logs up to 10MB
- 📊 Structured Output: Returns issues, suggestions, and diagnostic commands
- 📚 Documentation Search: Finds relevant documentation for identified issues
- 🌊 Streaming Support: Server-Sent Events for real-time analysis updates
- ☁️ Cloud Run Ready: Optimized for serverless deployment
- 💬 Interactive Mode: Q&A flow for clarification during analysis
- 💾 Memory/Persistence: Analysis history and context retention
- 🔄 Advanced Cycle Detection: Prevents infinite loops with pattern recognition
- 🛡️ Circuit Breaker: Fault tolerance for external services
- ⏱️ API Rate Limiting: Prevents quota exhaustion
- 🎯 Specialized Analyzers: Domain-specific analysis (HDFS, Security, Application)
- 🚄 Advanced Streaming: Parallel chunk processing for large logs
- 📈 Resource Tracking: Memory and CPU monitoring
- 🗄️ Intelligent Caching: Performance optimization with LRU cache
See ENHANCED_FEATURES.md for detailed documentation.
- Python 3.11+
- API Keys:
-
Clone the repository
git clone https://github.com/yourusername/log-analyzer-api.git cd log-analyzer-api
-
Set up environment
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install -r requirements.txt
-
Configure API keys
cp .env.example .env # Edit .env with your API keys
-
Run the server
uvicorn app.main:app --reload --port 8000
-
Access the API
- API: http://localhost:8000
- Docs: http://localhost:8000/docs
- Health: http://localhost:8000/health
POST /api/v1/analyze
Content-Type: application/json
{
"log_content": "2024-01-20 ERROR: Database connection failed...",
"environment_details": {
"os": "Ubuntu 22.04",
"service": "PostgreSQL 14"
},
"application_name": "web-api",
"analysis_type": "general"
}
POST /api/v1/analyze/stream
Content-Type: application/json
# Same request body as /analyze
# Returns Server-Sent Events stream
The easiest way to deploy - fully managed by LangChain with GitHub integration.
Quick Start:
- Push your code to GitHub
- Go to smith.langchain.com
- Click "LangGraph" → "New Deployment"
- Connect your GitHub repo
- Add your API keys
- Deploy!
See CLOUD_SAAS_DEPLOYMENT.md for detailed instructions.
- Google Cloud Run: See DEPLOYMENT_GUIDE.md
- Standalone Container: See LANGGRAPH_CLOUD_DEPLOYMENT.md
-
Push to GitHub
git init git add . git commit -m "Initial commit" git remote add origin https://github.com/yourusername/log-analyzer-api.git git push -u origin main
-
Deploy from Cloud Console
- Go to Cloud Run
- Click "Create Service"
- Select "Continuously deploy from a repository"
- Connect your GitHub account and select the repository
- Configure:
- Service name:
log-analyzer-api
- Region: Your preferred region
- Authentication: Allow unauthenticated invocations (or configure as needed)
- Container port: 8080
- Memory: 2 GiB
- CPU: 1
- Request timeout: 300 seconds
- Environment variables:
GEMINI_API_KEY=your-key GROQ_API_KEY=your-key TAVILY_API_KEY=your-key
- Service name:
# Install gcloud CLI if not already installed
# https://cloud.google.com/sdk/docs/install
# Authenticate
gcloud auth login
gcloud config set project YOUR_PROJECT_ID
# Deploy directly from source
gcloud run deploy log-analyzer-api \
--source . \
--region us-central1 \
--allow-unauthenticated \
--memory 2Gi \
--timeout 300 \
--set-env-vars "GEMINI_API_KEY=your-key,GROQ_API_KEY=your-key,TAVILY_API_KEY=your-key"
Variable | Description | Required | Default |
---|---|---|---|
GEMINI_API_KEY |
Google AI API key for Gemini model | Yes | - |
GROQ_API_KEY |
Groq API key for orchestration | Yes | - |
TAVILY_API_KEY |
Tavily API key for documentation search | Yes | - |
LOG_LEVEL |
Logging level | No | INFO |
MAX_LOG_SIZE_MB |
Maximum log file size in MB | No | 10 |
ENABLE_STREAMING |
Enable SSE streaming endpoint | No | true |
ANALYSIS_TIMEOUT |
Analysis timeout in seconds | No | 300 |
# Create secrets
echo -n "your-gemini-key" | gcloud secrets create gemini-api-key --data-file=-
echo -n "your-groq-key" | gcloud secrets create groq-api-key --data-file=-
echo -n "your-tavily-key" | gcloud secrets create tavily-api-key --data-file=-
# Grant access to Cloud Run service account
gcloud secrets add-iam-policy-binding gemini-api-key \
--member="serviceAccount:YOUR_SERVICE_ACCOUNT@YOUR_PROJECT.iam.gserviceaccount.com" \
--role="roles/secretmanager.secretAccessor"
# Update service to use secrets
gcloud run services update log-analyzer-api \
--update-secrets="GEMINI_API_KEY=gemini-api-key:latest,GROQ_API_KEY=groq-api-key:latest,TAVILY_API_KEY=tavily-api-key:latest"
import requests
url = "https://your-service-url.run.app/api/v1/analyze"
data = {
"log_content": """
2024-01-20 10:15:23 ERROR [database] Connection timeout after 30s
2024-01-20 10:15:24 ERROR [database] Failed to connect to PostgreSQL
2024-01-20 10:15:25 WARN [api] Fallback to cache due to database error
""",
"environment_details": {
"os": "Ubuntu 22.04",
"postgresql_version": "14.5"
},
"application_name": "web-api"
}
response = requests.post(url, json=data)
result = response.json()
print(f"Found {len(result['issues'])} issues")
for issue in result['issues']:
print(f"- {issue['severity']}: {issue['description']}")
curl -X POST https://your-service-url.run.app/api/v1/analyze \
-H "Content-Type: application/json" \
-d '{
"log_content": "ERROR: Database connection failed",
"environment_details": {"os": "Linux"}
}'
{
"analysis_id": "550e8400-e29b-41d4-a716-446655440000",
"timestamp": "2024-01-20T15:30:45.123Z",
"status": "completed",
"issues": [
{
"type": "database_error",
"description": "PostgreSQL connection timeout",
"severity": "critical",
"line_number": 1,
"timestamp": "2024-01-20 10:15:23"
}
],
"suggestions": [
{
"issue_type": "database_error",
"suggestion": "Check PostgreSQL service status and network connectivity",
"priority": "high",
"estimated_impact": "Restore database connectivity"
}
],
"documentation_references": [
{
"title": "PostgreSQL Connection Troubleshooting",
"url": "https://www.postgresql.org/docs/current/runtime-config-connection.html",
"relevance": "high",
"excerpt": "Connection timeout parameters..."
}
],
"diagnostic_commands": [
{
"command": "systemctl status postgresql",
"description": "Check PostgreSQL service status",
"platform": "linux"
}
],
"summary": "Critical database connectivity issue detected",
"metrics": {
"total_lines": 3,
"issues_found": 1,
"processing_time": 2.34,
"log_size_mb": 0.001
}
}
- View metrics in Cloud Console: CPU, Memory, Request count, Latency
- Set up alerts for errors or high latency
# View logs
gcloud run services logs read log-analyzer-api --limit 50
# Stream logs
gcloud run services logs tail log-analyzer-api
- Cloud Run charges only for actual usage
- Typical costs:
- CPU: ~$0.00002400 per vCPU-second
- Memory: ~$0.00000250 per GiB-second
- Requests: ~$0.40 per million requests
- Set minimum instances to 0 for development
- Use Cloud Scheduler for warming if needed
-
API Key Errors
- Ensure all three API keys are set correctly
- Check Secret Manager permissions if using secrets
-
Timeout Errors
- Increase Cloud Run timeout (max 3600 seconds)
- Reduce log size or use streaming endpoint
-
Memory Errors
- Increase Cloud Run memory allocation
- Current limit: 10MB per log file
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
MIT License - see LICENSE file for details