GitHub - westonbrown/Cyber-AutoAgent: AI cybersecurity agent for automated penetration testing and vulnerability assessment

 ██████╗██╗   ██╗██████╗ ███████╗██████╗ 
██╔════╝╚██╗ ██╔╝██╔══██╗██╔════╝██╔══██╗
██║      ╚████╔╝ ██████╔╝█████╗  ██████╔╝
██║       ╚██╔╝  ██╔══██╗██╔══╝  ██╔══██╗
╚██████╗   ██║   ██████╔╝███████╗██║  ██║
 ╚═════╝   ╚═╝   ╚═════╝ ╚══════╝╚═╝  ╚═╝

█████╗ ██╗   ██╗████████╗ ██████╗  █████╗  ██████╗ ███████╗███╗   ██╗████████╗
██╔══██╗██║   ██║╚══██╔══╝██╔═══██╗██╔══██╗██╔════╝ ██╔════╝████╗  ██║╚══██╔══╝
███████║██║   ██║   ██║   ██║   ██║███████║██║  ███╗█████╗  ██╔██╗ ██║   ██║   
██╔══██║██║   ██║   ██║   ██║   ██║██╔══██║██║   ██║██╔══╝  ██║╚██╗██║   ██║   
██║  ██║╚██████╔╝   ██║   ╚██████╔╝██║  ██║╚██████╔╝███████╗██║ ╚████║   ██║   
╚═╝  ╚═╝ ╚═════╝    ╚═╝    ╚═════╝ ╚═╝  ╚═╝ ╚═════╝ ╚══════╝╚═╝  ╚═══╝   ╚═╝

[!] EXPERIMENTAL SOFTWARE - USE ONLY IN AUTHORIZED, SAFE, SANDBOXED ENVIRONMENTS [!]

Proactive Cybersecurity Autonomous Agent Powered by AI

Cyber-AutoAgent is a proactive security assessment tool that autonomously conducts intelligent penetration testing with natural language reasoning, dynamic tool selection, and evidence collection using AWS Bedrock or local Ollama models with the Strands framework.

Cyber-AutoAgent in action - Autonomous security assessment with AI reasoning

Quick Start

# Using Docker (Recommended)
docker run --rm \
  -v ~/.aws:/home/cyberagent/.aws:ro \
  -v $(pwd)/evidence:/app/evidence \
  cyber-autoagent \
  --target "http://testphp.vulnweb.com" \
  --objective "Identify SQL injection vulnerabilities"

# Using Python
git clone https://github.com/cyber-autoagent/cyber-autoagent.git
cd cyber-autoagent
pip install -e .
python src/cyberautoagent.py --target "192.168.1.100" --objective "Comprehensive security assessment"

Important Disclaimer

THIS TOOL IS FOR EDUCATIONAL AND AUTHORIZED SECURITY TESTING PURPOSES ONLY.

[+] Use only on systems you own or have explicit written permission to test
[+] Deploy in safe, sandboxed environments isolated from production systems
[+] Ensure compliance with all applicable laws and regulations
[-] Never use on unauthorized systems or networks
[-] Users are fully responsible for legal and ethical use

Features

Autonomous Operation: Conducts security assessments with minimal human intervention
Intelligent Tool Selection: Automatically chooses appropriate security tools (nmap, sqlmap, nikto, etc.)
Natural Language Reasoning: Uses Strands framework with metacognitive architecture
Evidence Collection: Automatically stores findings with Mem0 memory (category="finding")
Meta-Tool Creation: Dynamically creates custom exploitation tools when needed
Adaptive Execution: Metacognitive assessment guides strategy based on confidence levels
Professional Reporting: Generates comprehensive assessment reports
Swarm Intelligence: Deploy parallel agents with shared memory for complex tasks

Architecture

System Architecture

graph LR
    A[User Input] --> B[Cyber-AutoAgent]
    B --> C[AI Model]
    B --> D[Security Tools]
    B --> E[Evidence Storage]
    
    C --> B
    D --> E
    E --> F[Final Report]
    
    style A fill:#e3f2fd
    style F fill:#e8f5e8
    style B fill:#f3e5f5
    style C fill:#fff3e0

Key Components:

User provides target and objectives via command line
Agent orchestrates assessment using AI reasoning
Security tools execute scans and exploits
Evidence system stores and analyzes findings

Assessment Execution Flow

sequenceDiagram
    participant U as User
    participant A as Agent
    participant M as AI Model
    participant T as Tools
    participant E as Evidence

    U->>A: Start Assessment
    A->>E: Initialize Storage
    
    loop Assessment Steps
        A->>M: Analyze Situation
        M-->>A: Next Action
        A->>T: Execute Tool
        T-->>A: Results
        A->>E: Store Findings
        
        alt Critical Discovery
            A->>T: Exploit Immediately
            T-->>A: Access Gained
            A->>E: Store Evidence
        end
        
        A->>A: Check Progress
        
        alt Success
            break Complete
                A->>U: Report Success
            end
        end
    end
    
    A->>M: Generate Report
    M-->>A: Final Analysis
    A->>U: Deliver Report

Execution Pattern:

Agent continuously analyzes situation and selects appropriate tools
Critical discoveries trigger immediate exploitation attempts
All findings stored as evidence for final analysis
Assessment completes when objectives met or budget exhausted

Metacognitive Assessment Cycle

flowchart TD
    A[Think: Analyze Current State] --> B{Select Tool Type}
    
    B --> |Basic Task| C[Shell Commands]
    B --> |Security Task| D[Cyber Tools via Shell]
    B --> |Complex Task| E[Create Meta-Tool]
    B --> |Parallel Task| P[Swarm Orchestration]
    
    C --> F[Reflect: Evaluate Results]
    D --> F
    E --> F
    P --> F
    
    F --> G{Findings?}
    
    G --> |Critical| H[Exploit Immediately]
    G --> |Informational| I[Store & Continue]
    G --> |None| J[Try Different Approach]
    
    H --> K[Document Evidence]
    I --> L{Objective Met?}
    J --> A
    K --> L
    
    L --> |Yes| M[Complete Assessment]
    L --> |No| A
    
    style A fill:#e3f2fd
    style C fill:#e8f5e8
    style D fill:#fff3e0
    style E fill:#f3e5f5
    style P fill:#fce4ec
    style H fill:#ffcdd2

Metacognitive Process:

Design Philosophy: Meta-Everything Architecture

At the core of Cyber-AutoAgent is a "meta-everything" design philosophy that enables dynamic adaptation and scaling:

Meta-Agent: The swarm capability deploys dynamic agents as tools, each tailored for specific subtasks with their own reasoning loops
Meta-Tooling: Through the editor and load_tool capabilities, the agent can create, modify, and deploy new tools at runtime to address novel challenges
Meta-Learning: Continuous memory storage and retrieval enables cross-session learning, building expertise over time
Meta-Cognition: Self-reflection and confidence assessment drives strategic decisions about tool selection and approach (Note: This aspect is still being expanded for deeper reasoning capabilities)

This meta-architecture allows the system to transcend static tool limitations and evolve its capabilities during execution.

Process Flow:

Assess Confidence: Evaluate current knowledge and confidence level (High >80%, Medium 50-80%, Low <50%)
Adaptive Strategy:
- High confidence → Use specialized tools directly
- Medium confidence → Deploy swarm for parallel exploration
- Low confidence → Gather more information, try alternatives
Execute: Tool hierarchy based on confidence:
- Professional security tools for known vulnerabilities (sqlmap, nikto, nmap)
- Swarm deployment when multiple approaches needed (with memory access)
- Parallel shell for rapid reconnaissance (up to 7 commands)
- Meta-tool creation only when no existing tool suffices
Learn & Store: Store findings with category="finding" for memory persistence

Tool Selection Hierarchy (Confidence-Based):

Specialized cyber tools (sqlmap, nikto, metasploit) - when vulnerability type is known
Swarm deployment - when confidence <70% or need multiple perspectives (includes memory)
Parallel shell execution - for rapid multi-command reconnaissance
Meta-tool creation - only for novel exploits when existing tools fail

Model Providers

Cyber-AutoAgent supports two model providers for maximum flexibility:

Remote Mode (AWS Bedrock)

Best for: Production use, high-quality results, no local GPU requirements
Requirements: AWS account with Bedrock access
Default Model: Claude Sonnet 4 (us.anthropic.claude-sonnet-4-20250514-v1:0)
Benefits: Latest models, reliable performance, managed infrastructure

Local Mode (Ollama)

Best for: Privacy, offline use, cost control, local development
Requirements: Local Ollama installation
Default Models: llama3.2:3b (LLM), mxbai-embed-large (embeddings)
Alternative Models: llama3.1:8b (better reasoning), qwen2.5:7b (more efficient)
Benefits: No cloud dependencies, complete privacy, no API costs

Comparison

Feature	Remote (AWS Bedrock)	Local (Ollama)
Cost	Pay per API call	One-time setup
Performance	High (managed)	Depends on hardware
Offline Use	No	Yes
Setup Complexity	Moderate	Higher
Model Quality	Highest	Low

Installation & Deployment

Prerequisites

Remote Mode (AWS Bedrock)

# Configure AWS credentials
aws configure
# Or set environment variables:
export AWS_ACCESS_KEY_ID=your_key
export AWS_SECRET_ACCESS_KEY=your_secret
export AWS_REGION=your_region

Local Mode (Ollama)

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Start service and pull models
ollama serve
ollama pull llama3.2:3b
ollama pull mxbai-embed-large

Docker Deployment (Recommended)

# Clone repository
git clone https://github.com/cyber-autoagent/cyber-autoagent.git
cd cyber-autoagent

# Build image
docker build -t cyber-autoagent .

# Run with AWS credentials (using volume mount)
docker run --rm \
  -v ~/.aws:/home/cyberagent/.aws:ro \
  -v $(pwd)/evidence:/app/evidence \
  -v $(pwd)/logs:/app/logs \
  cyber-autoagent \
  --target "http://testphp.vulnweb.com" \
  --objective "Identify vulnerabilities"

# Using environment variables
docker run --rm \
  -e AWS_ACCESS_KEY_ID=${AWS_ACCESS_KEY_ID} \
  -e AWS_SESSION_TOKEN=${AWS_SESSION_TOKEN} \
  -e AWS_SECRET_ACCESS_KEY=${AWS_SECRET_ACCESS_KEY} \
  -e AWS_REGION=${AWS_REGION:-us-east-1} \
  -v $(pwd)/evidence:/app/evidence \
  -v $(pwd)/logs:/app/logs \
  cyber-autoagent \
  --target "http://localhost" \
  --objective "Identify vulnerabilities and document" \
  --iterations 4

Local Installation

# Clone repository
git clone https://github.com/cyber-autoagent/cyber-autoagent.git
cd cyber-autoagent

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -e .

# Optional: Install security tools
sudo apt install nmap nikto sqlmap gobuster  # Debian/Ubuntu
brew install nmap nikto sqlmap gobuster      # macOS

# Run
python src/cyberautoagent.py \
  --target "http://testphp.vulnweb.com" \
  --objective "Comprehensive security assessment"

Data Storage

Data Type	Location
Evidence	`./evidence/evidence_OP_*`
Logs	`./logs/cyber_operations.log`
Reports	`./evidence/evidence_OP_*/`

Directories are created automatically on first run.

Command-Line Arguments

Required Arguments:

--objective: Security assessment objective
--target: Target system/network to assess (ensure you have permission!)

Optional Arguments:

--server: Model provider - remote (AWS Bedrock) or local (Ollama), default: remote
--iterations: Maximum tool executions before stopping, default: 100
--model: Model ID to use (default: remote=claude-sonnet, local=llama3.2:3b)
--region: AWS region for Bedrock, default: us-east-1
--verbose: Enable verbose output with detailed debug logging
--confirmations: Enable tool confirmation prompts (default: disabled)
--memory-path: Path to existing FAISS memory store to load past memories
--keep-memory: Keep memory data after operation completes (default: remove)

Usage Examples

# Local Mode (Ollama)
python src/cyberautoagent.py \
  --server local \
  --target "192.168.1.100" \
  --objective "Web vulnerability assessment"

# With custom model and region
python src/cyberautoagent.py \
  --server remote \
  --target "example.com" \
  --objective "Find SQL injection vulnerabilities" \
  --model "us.anthropic.claude-sonnet-4-20250514-v1:0" \
  --region "us-west-2"

Configuration

Environment Variables

# AWS Bedrock (Remote Mode)
export AWS_ACCESS_KEY_ID=your_key
export AWS_SECRET_ACCESS_KEY=your_secret
export AWS_REGION=us-east-1

# Ollama (Local Mode)
export OLLAMA_HOST=http://localhost:11434  # Optional

# Memory Storage (Optional)
export MEM0_API_KEY=your_key               # Mem0 Platform
export OPENSEARCH_HOST=your-host.com       # OpenSearch

Development & Testing

Running Tests

This project uses uv for dependency management and testing:

# Run all tests
uv run pytest

# Run specific test file
uv run pytest tests/test_agent.py

# Run tests with verbose output
uv run pytest -v

# Run tests with coverage
uv run pytest --cov=src

Project Structure

cyber-autoagent/
|- src/
|  |- cyberautoagent.py       # Main entry point
|  |- modules/
|     |- __init__.py         # Module initialization
|     |- utils.py            # UI utilities and analysis functions
|     |- environment.py      # Environment setup and tool discovery
|     |- system_prompts.py   # System prompt templates 
|     |- agent_handlers.py   # Core agent callback handlers
|     |- agent.py            # Agent creation and configuration
|- pyproject.toml              # Project configuration
|- README.md                   # This file
|- LICENSE                     # MIT License

Troubleshooting

Common Issues

AWS Credentials Not Found

# Configure AWS CLI
aws configure

# Or set environment variables
export AWS_ACCESS_KEY_ID=your_key
export AWS_SECRET_ACCESS_KEY=your_secret
export AWS_REGION=us-east-1

Model Access Denied

# Request model access in AWS Console
# Navigate to: Amazon Bedrock > Model access > Request model access

Memory System Errors

# For local FAISS backend (default)
pip install faiss-cpu  # or faiss-gpu for CUDA

# For Mem0 Platform
export MEM0_API_KEY=your_api_key

# For OpenSearch backend
export OPENSEARCH_HOST=your_host
export AWS_REGION=your_region

# Check memory storage location
ls -la ./mem0_faiss_OP_*/

Tool Not Found Errors

# Install missing security tools
sudo apt install nmap nikto sqlmap gobuster  # Debian/Ubuntu
brew install nmap nikto sqlmap gobuster      # macOS

Ollama Issues (Local Mode)

Ollama Server Not Running

# Start Ollama service
ollama serve

# Check if running
curl http://localhost:11434/api/version

Required Models Missing

# Pull required models
ollama pull llama3.2:3b
ollama pull mxbai-embed-large

# List available models
ollama list

Connection Errors

# Check Ollama is accessible
curl -X POST http://localhost:11434/api/generate \
  -H "Content-Type: application/json" \
  -d '{"model": "llama3.2:3b", "prompt": "test", "stream": false}'

Docker Networking (Local Mode) Cyber-AutoAgent automatically detects the correct Ollama host for your environment:

# Ensure Ollama is running on your host
ollama serve

# Test connection from host
curl http://localhost:11434/api/version

Performance Issues

# Monitor resource usage
htop  # Check CPU/Memory during execution

# For better performance, consider:
# - Using smaller models (e.g., llama3.1:8b instead of 70b)
# - Allocating more RAM to Ollama
# - Using GPU acceleration if available

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Legal Notice

This tool is provided for educational and authorized security testing purposes only. Users are solely responsible for ensuring they have proper authorization before testing any systems. The authors assume no liability for misuse or any damages that may result from using this software.

Acknowledgments

Strands Framework - Agent orchestration & swarm intelligence
AWS Bedrock - Foundation model access
Ollama - Local model inference
Mem0 - Advanced memory management with FAISS/OpenSearch/Platform backends

Remember: With great power comes great responsibility. Use this tool ethically and legally.

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
.github/workflows		.github/workflows
benchmark_testing		benchmark_testing
docs		docs
src		src
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.pylintrc		.pylintrc
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

License

westonbrown/Cyber-AutoAgent

Folders and files

Latest commit

History

Repository files navigation

Proactive Cybersecurity Autonomous Agent Powered by AI

Table of Contents

Quick Start

Important Disclaimer

Features

Architecture

System Architecture

Assessment Execution Flow

Metacognitive Assessment Cycle

Model Providers

Remote Mode (AWS Bedrock)

Local Mode (Ollama)

Comparison

Installation & Deployment

Prerequisites

Docker Deployment (Recommended)

Local Installation

Data Storage

Command-Line Arguments

Usage Examples

Configuration

Environment Variables

Development & Testing

Running Tests

Project Structure

Troubleshooting

Common Issues

AWS Credentials Not Found

Model Access Denied

Memory System Errors

Tool Not Found Errors

Ollama Issues (Local Mode)

Contributing

License

Legal Notice

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 3

Languages

Packages