██████╗██╗ ██╗██████╗ ███████╗██████╗
██╔════╝╚██╗ ██╔╝██╔══██╗██╔════╝██╔══██╗
██║ ╚████╔╝ ██████╔╝█████╗ ██████╔╝
██║ ╚██╔╝ ██╔══██╗██╔══╝ ██╔══██╗
╚██████╗ ██║ ██████╔╝███████╗██║ ██║
╚═════╝ ╚═╝ ╚═════╝ ╚══════╝╚═╝ ╚═╝
█████╗ ██╗ ██╗████████╗ ██████╗ █████╗ ██████╗ ███████╗███╗ ██╗████████╗
██╔══██╗██║ ██║╚══██╔══╝██╔═══██╗██╔══██╗██╔════╝ ██╔════╝████╗ ██║╚══██╔══╝
███████║██║ ██║ ██║ ██║ ██║███████║██║ ███╗█████╗ ██╔██╗ ██║ ██║
██╔══██║██║ ██║ ██║ ██║ ██║██╔══██║██║ ██║██╔══╝ ██║╚██╗██║ ██║
██║ ██║╚██████╔╝ ██║ ╚██████╔╝██║ ██║╚██████╔╝███████╗██║ ╚████║ ██║
╚═╝ ╚═╝ ╚═════╝ ╚═╝ ╚═════╝ ╚═╝ ╚═╝ ╚═════╝ ╚══════╝╚═╝ ╚═══╝ ╚═╝
[!] EXPERIMENTAL SOFTWARE - USE ONLY IN AUTHORIZED, SAFE, SANDBOXED ENVIRONMENTS [!]
Cyber-AutoAgent is a proactive security assessment tool that autonomously conducts intelligent penetration testing with natural language reasoning, dynamic tool selection, and evidence collection using AWS Bedrock or local Ollama models with the Strands framework.
- Important Disclaimer
- Features
- Architecture
- Model Providers
- Installation & Deployment
- Quick Start
- Development & Testing
- Troubleshooting
- Contributing
- License
# Using Docker (Recommended)
docker run --rm \
-v ~/.aws:/home/cyberagent/.aws:ro \
-v $(pwd)/evidence:/app/evidence \
cyber-autoagent \
--target "http://testphp.vulnweb.com" \
--objective "Identify SQL injection vulnerabilities"
# Using Python
git clone https://github.com/cyber-autoagent/cyber-autoagent.git
cd cyber-autoagent
pip install -e .
python src/cyberautoagent.py --target "192.168.1.100" --objective "Comprehensive security assessment"
THIS TOOL IS FOR EDUCATIONAL AND AUTHORIZED SECURITY TESTING PURPOSES ONLY.
- [+] Use only on systems you own or have explicit written permission to test
- [+] Deploy in safe, sandboxed environments isolated from production systems
- [+] Ensure compliance with all applicable laws and regulations
- [-] Never use on unauthorized systems or networks
- [-] Users are fully responsible for legal and ethical use
- Autonomous Operation: Conducts security assessments with minimal human intervention
- Intelligent Tool Selection: Automatically chooses appropriate security tools (nmap, sqlmap, nikto, etc.)
- Natural Language Reasoning: Uses Strands framework with metacognitive architecture
- Evidence Collection: Automatically stores findings with Mem0 memory (category="finding")
- Meta-Tool Creation: Dynamically creates custom exploitation tools when needed
- Adaptive Execution: Metacognitive assessment guides strategy based on confidence levels
- Professional Reporting: Generates comprehensive assessment reports
- Swarm Intelligence: Deploy parallel agents with shared memory for complex tasks
graph LR
A[User Input] --> B[Cyber-AutoAgent]
B --> C[AI Model]
B --> D[Security Tools]
B --> E[Evidence Storage]
C --> B
D --> E
E --> F[Final Report]
style A fill:#e3f2fd
style F fill:#e8f5e8
style B fill:#f3e5f5
style C fill:#fff3e0
Key Components:
- User provides target and objectives via command line
- Agent orchestrates assessment using AI reasoning
- Security tools execute scans and exploits
- Evidence system stores and analyzes findings
sequenceDiagram
participant U as User
participant A as Agent
participant M as AI Model
participant T as Tools
participant E as Evidence
U->>A: Start Assessment
A->>E: Initialize Storage
loop Assessment Steps
A->>M: Analyze Situation
M-->>A: Next Action
A->>T: Execute Tool
T-->>A: Results
A->>E: Store Findings
alt Critical Discovery
A->>T: Exploit Immediately
T-->>A: Access Gained
A->>E: Store Evidence
end
A->>A: Check Progress
alt Success
break Complete
A->>U: Report Success
end
end
end
A->>M: Generate Report
M-->>A: Final Analysis
A->>U: Deliver Report
Execution Pattern:
- Agent continuously analyzes situation and selects appropriate tools
- Critical discoveries trigger immediate exploitation attempts
- All findings stored as evidence for final analysis
- Assessment completes when objectives met or budget exhausted
flowchart TD
A[Think: Analyze Current State] --> B{Select Tool Type}
B --> |Basic Task| C[Shell Commands]
B --> |Security Task| D[Cyber Tools via Shell]
B --> |Complex Task| E[Create Meta-Tool]
B --> |Parallel Task| P[Swarm Orchestration]
C --> F[Reflect: Evaluate Results]
D --> F
E --> F
P --> F
F --> G{Findings?}
G --> |Critical| H[Exploit Immediately]
G --> |Informational| I[Store & Continue]
G --> |None| J[Try Different Approach]
H --> K[Document Evidence]
I --> L{Objective Met?}
J --> A
K --> L
L --> |Yes| M[Complete Assessment]
L --> |No| A
style A fill:#e3f2fd
style C fill:#e8f5e8
style D fill:#fff3e0
style E fill:#f3e5f5
style P fill:#fce4ec
style H fill:#ffcdd2
Metacognitive Process:
Design Philosophy: Meta-Everything Architecture
At the core of Cyber-AutoAgent is a "meta-everything" design philosophy that enables dynamic adaptation and scaling:
- Meta-Agent: The swarm capability deploys dynamic agents as tools, each tailored for specific subtasks with their own reasoning loops
- Meta-Tooling: Through the editor and load_tool capabilities, the agent can create, modify, and deploy new tools at runtime to address novel challenges
- Meta-Learning: Continuous memory storage and retrieval enables cross-session learning, building expertise over time
- Meta-Cognition: Self-reflection and confidence assessment drives strategic decisions about tool selection and approach (Note: This aspect is still being expanded for deeper reasoning capabilities)
This meta-architecture allows the system to transcend static tool limitations and evolve its capabilities during execution.
Process Flow:
- Assess Confidence: Evaluate current knowledge and confidence level (High >80%, Medium 50-80%, Low <50%)
- Adaptive Strategy:
- High confidence → Use specialized tools directly
- Medium confidence → Deploy swarm for parallel exploration
- Low confidence → Gather more information, try alternatives
- Execute: Tool hierarchy based on confidence:
- Professional security tools for known vulnerabilities (sqlmap, nikto, nmap)
- Swarm deployment when multiple approaches needed (with memory access)
- Parallel shell for rapid reconnaissance (up to 7 commands)
- Meta-tool creation only when no existing tool suffices
- Learn & Store: Store findings with category="finding" for memory persistence
Tool Selection Hierarchy (Confidence-Based):
- Specialized cyber tools (sqlmap, nikto, metasploit) - when vulnerability type is known
- Swarm deployment - when confidence <70% or need multiple perspectives (includes memory)
- Parallel shell execution - for rapid multi-command reconnaissance
- Meta-tool creation - only for novel exploits when existing tools fail
Cyber-AutoAgent supports two model providers for maximum flexibility:
- Best for: Production use, high-quality results, no local GPU requirements
- Requirements: AWS account with Bedrock access
- Default Model: Claude Sonnet 4 (us.anthropic.claude-sonnet-4-20250514-v1:0)
- Benefits: Latest models, reliable performance, managed infrastructure
- Best for: Privacy, offline use, cost control, local development
- Requirements: Local Ollama installation
- Default Models:
llama3.2:3b
(LLM),mxbai-embed-large
(embeddings) - Alternative Models:
llama3.1:8b
(better reasoning),qwen2.5:7b
(more efficient) - Benefits: No cloud dependencies, complete privacy, no API costs
Feature | Remote (AWS Bedrock) | Local (Ollama) |
---|---|---|
Cost | Pay per API call | One-time setup |
Performance | High (managed) | Depends on hardware |
Offline Use | No | Yes |
Setup Complexity | Moderate | Higher |
Model Quality | Highest | Low |
Remote Mode (AWS Bedrock)
# Configure AWS credentials
aws configure
# Or set environment variables:
export AWS_ACCESS_KEY_ID=your_key
export AWS_SECRET_ACCESS_KEY=your_secret
export AWS_REGION=your_region
Local Mode (Ollama)
# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
# Start service and pull models
ollama serve
ollama pull llama3.2:3b
ollama pull mxbai-embed-large
# Clone repository
git clone https://github.com/cyber-autoagent/cyber-autoagent.git
cd cyber-autoagent
# Build image
docker build -t cyber-autoagent .
# Run with AWS credentials (using volume mount)
docker run --rm \
-v ~/.aws:/home/cyberagent/.aws:ro \
-v $(pwd)/evidence:/app/evidence \
-v $(pwd)/logs:/app/logs \
cyber-autoagent \
--target "http://testphp.vulnweb.com" \
--objective "Identify vulnerabilities"
# Using environment variables
docker run --rm \
-e AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID} \
-e AWS_SESSION_TOKEN=${AWS_SESSION_TOKEN} \
-e AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY} \
-e AWS_REGION=${AWS_REGION:-us-east-1} \
-v $(pwd)/evidence:/app/evidence \
-v $(pwd)/logs:/app/logs \
cyber-autoagent \
--target "http://localhost" \
--objective "Identify vulnerabilities and document" \
--iterations 4
# Clone repository
git clone https://github.com/cyber-autoagent/cyber-autoagent.git
cd cyber-autoagent
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -e .
# Optional: Install security tools
sudo apt install nmap nikto sqlmap gobuster # Debian/Ubuntu
brew install nmap nikto sqlmap gobuster # macOS
# Run
python src/cyberautoagent.py \
--target "http://testphp.vulnweb.com" \
--objective "Comprehensive security assessment"
Data Type | Location |
---|---|
Evidence | ./evidence/evidence_OP_* |
Logs | ./logs/cyber_operations.log |
Reports | ./evidence/evidence_OP_*/ |
Directories are created automatically on first run.
Required Arguments:
--objective
: Security assessment objective--target
: Target system/network to assess (ensure you have permission!)
Optional Arguments:
--server
: Model provider -remote
(AWS Bedrock) orlocal
(Ollama), default: remote--iterations
: Maximum tool executions before stopping, default: 100--model
: Model ID to use (default: remote=claude-sonnet, local=llama3.2:3b)--region
: AWS region for Bedrock, default: us-east-1--verbose
: Enable verbose output with detailed debug logging--confirmations
: Enable tool confirmation prompts (default: disabled)--memory-path
: Path to existing FAISS memory store to load past memories--keep-memory
: Keep memory data after operation completes (default: remove)
# Local Mode (Ollama)
python src/cyberautoagent.py \
--server local \
--target "192.168.1.100" \
--objective "Web vulnerability assessment"
# With custom model and region
python src/cyberautoagent.py \
--server remote \
--target "example.com" \
--objective "Find SQL injection vulnerabilities" \
--model "us.anthropic.claude-sonnet-4-20250514-v1:0" \
--region "us-west-2"
# AWS Bedrock (Remote Mode)
export AWS_ACCESS_KEY_ID=your_key
export AWS_SECRET_ACCESS_KEY=your_secret
export AWS_REGION=us-east-1
# Ollama (Local Mode)
export OLLAMA_HOST=http://localhost:11434 # Optional
# Memory Storage (Optional)
export MEM0_API_KEY=your_key # Mem0 Platform
export OPENSEARCH_HOST=your-host.com # OpenSearch
This project uses uv
for dependency management and testing:
# Run all tests
uv run pytest
# Run specific test file
uv run pytest tests/test_agent.py
# Run tests with verbose output
uv run pytest -v
# Run tests with coverage
uv run pytest --cov=src
cyber-autoagent/
|- src/
| |- cyberautoagent.py # Main entry point
| |- modules/
| |- __init__.py # Module initialization
| |- utils.py # UI utilities and analysis functions
| |- environment.py # Environment setup and tool discovery
| |- system_prompts.py # System prompt templates
| |- agent_handlers.py # Core agent callback handlers
| |- agent.py # Agent creation and configuration
|- pyproject.toml # Project configuration
|- README.md # This file
|- LICENSE # MIT License
# Configure AWS CLI
aws configure
# Or set environment variables
export AWS_ACCESS_KEY_ID=your_key
export AWS_SECRET_ACCESS_KEY=your_secret
export AWS_REGION=us-east-1
# Request model access in AWS Console
# Navigate to: Amazon Bedrock > Model access > Request model access
# For local FAISS backend (default)
pip install faiss-cpu # or faiss-gpu for CUDA
# For Mem0 Platform
export MEM0_API_KEY=your_api_key
# For OpenSearch backend
export OPENSEARCH_HOST=your_host
export AWS_REGION=your_region
# Check memory storage location
ls -la ./mem0_faiss_OP_*/
# Install missing security tools
sudo apt install nmap nikto sqlmap gobuster # Debian/Ubuntu
brew install nmap nikto sqlmap gobuster # macOS
Ollama Server Not Running
# Start Ollama service
ollama serve
# Check if running
curl http://localhost:11434/api/version
Required Models Missing
# Pull required models
ollama pull llama3.2:3b
ollama pull mxbai-embed-large
# List available models
ollama list
Connection Errors
# Check Ollama is accessible
curl -X POST http://localhost:11434/api/generate \
-H "Content-Type: application/json" \
-d '{"model": "llama3.2:3b", "prompt": "test", "stream": false}'
Docker Networking (Local Mode) Cyber-AutoAgent automatically detects the correct Ollama host for your environment:
# Ensure Ollama is running on your host
ollama serve
# Test connection from host
curl http://localhost:11434/api/version
Performance Issues
# Monitor resource usage
htop # Check CPU/Memory during execution
# For better performance, consider:
# - Using smaller models (e.g., llama3.1:8b instead of 70b)
# - Allocating more RAM to Ollama
# - Using GPU acceleration if available
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
This tool is provided for educational and authorized security testing purposes only. Users are solely responsible for ensuring they have proper authorization before testing any systems. The authors assume no liability for misuse or any damages that may result from using this software.
- Strands Framework - Agent orchestration & swarm intelligence
- AWS Bedrock - Foundation model access
- Ollama - Local model inference
- Mem0 - Advanced memory management with FAISS/OpenSearch/Platform backends
Remember: With great power comes great responsibility. Use this tool ethically and legally.