Agent Orchestration System

TypeScript framework for building autonomous, collaborative AI agents

Key capabilities:

Autonomous Agents: Agents gather information via tools, making independent decisions without massive context dumps
Deep Reasoning: Multi-provider thinking support (Claude, OpenAI, OpenRouter) for complex planning and problem-solving
Agent Collaboration: Agents delegate to specialized sub-agents, forming dynamic teams for complex tasks
Multi-Provider Support: Switch between Anthropic, OpenAI, OpenRouter, or custom providers with simple configuration
Production-Ready: Built-in security, retry logic, session persistence, and comprehensive monitoring
Cost Efficient: Smart caching delivers up to 90% cost savings on multi-agent workflows

📦 Installation

Install from npm (no authentication required):

# Install core library
npm install @nielspeter/agent-orchestration-core

# Install CLI globally
npm install -g @nielspeter/agent-orchestration-cli

# Or install both
npm install @nielspeter/agent-orchestration-core @nielspeter/agent-orchestration-cli

Package URLs:

🎯 Architecture Highlights

Clean Middleware Pipeline

type Middleware = (ctx: MiddlewareContext, next: () => Promise<void>) => Promise<void>;

The monolithic 500-line AgentExecutor has been refactored into a clean pipeline of focused middleware:

ErrorHandlerMiddleware - Global error boundary
AgentLoaderMiddleware - Loads agents and filters tools
ThinkingMiddleware - Validates and normalizes thinking configuration
ContextSetupMiddleware - Manages conversation context
ProviderSelectionMiddleware - Selects LLM provider (Anthropic, OpenRouter, etc.)
SafetyChecksMiddleware - Enforces limits (depth, iterations, tokens)
SmartRetryMiddleware - Retries on rate limits (429) with exponential backoff
LLMCallMiddleware - Handles LLM communication
ToolExecutionMiddleware - Orchestrates tool execution

Everything is an Agent

No special orchestrator class - all agents use the same pipeline
Agents are defined as markdown files with YAML frontmatter
Orchestration emerges through the Delegate tool for delegation

Pull Architecture with Caching

When agent A delegates to agent B:

B receives minimal context (~5-500 tokens) - just the task prompt
B uses tools (Read, Write, List, Grep, Delegate) to pull information it needs
Anthropic's cache makes "redundant" reads efficient (90% cost savings)
Clean separation - each agent has independent context

🔄 Core Patterns

The Agentic Loop (ReAct Pattern)

Each agent automatically implements the Reason → Act → Observe loop:

Reason: Agent analyzes prompt and decides what to do
Act: Agent calls tools to gather information or take action
Observe: Agent processes tool results
Repeat: Continue until task is complete (no more tool calls)

This iterative refinement allows agents to:

Build understanding incrementally
Correct mistakes
Ground responses in actual data
Never hallucinate file contents

See Agentic Loop Pattern for details.

Iteration vs Delegation

Iteration: Same agent refining its response (limited by MAX_ITERATIONS)
Delegation: Calling another agent via Delegate tool (limited by MAX_DEPTH)

🚀 Quick Start

# Install dependencies
npm install

# Set up API keys (at least one required)
cp .env.example .env
# Edit .env with your ANTHROPIC_API_KEY or OPENROUTER_API_KEY

# Optional: Configure providers
cp providers-config.example.json providers-config.json

# Build the project
npm run build

# Run tests
npm test              # Run all tests
npm run test:unit     # Unit tests only (no API)
npm run test:integration # Integration tests (requires API key)

# Use CLI
npm run cli -- -p "Hello, world!"       # CLI tool
echo "Analyze this" | npm run cli       # stdin support

# Run examples
npx tsx packages/examples/quickstart.ts          # Simple quickstart
npx tsx packages/examples/orchestration.ts       # Agent orchestration
npx tsx packages/examples/configuration.ts       # Config file usage
npx tsx packages/examples/code-first-config.ts   # Code-first configuration (no files)
npx tsx packages/examples/logging.ts             # Logging features
npx tsx packages/examples/mcp-integration.ts     # MCP server support
npx tsx packages/examples/werewolf-game.ts       # Autonomous multi-agent game
npx tsx packages/examples/coding-team.ts         # Collaborative coding agents

🎮 Examples

Basic Agent Execution (quickstart.ts)

Simple demonstration of agent execution with file operations.

Agent Orchestration (orchestration.ts)

Shows how agents delegate tasks to specialized sub-agents using the Delegate tool.

Configuration Files (configuration.ts)

Demonstrates loading agent system configuration from JSON files.

Code-First Configuration (code-first-config.ts)

Shows programmatic configuration without config files. Includes 5 examples:

Basic code-first configuration with .withProvidersConfig() and .withAPIKeys()
Secret manager integration (simulated AWS Secrets Manager)
Testing configuration (no file dependencies)
Dynamic configuration based on runtime conditions
API key precedence demonstration

npx tsx packages/examples/code-first-config.ts

Ideal for testing, CI/CD, and production deployments where config files aren't practical.

Werewolf Game - Autonomous Agents (werewolf-game.ts)

A complex multi-agent game demonstrating true agent autonomy:

Game-master agent orchestrates the entire game independently
Role agents (werewolf, seer, villager) make strategic decisions
Evidence-based gameplay with alibis, deductions, and voting
No hardcoded logic - all game rules exist in agent prompts

This example showcases how agents can be truly autonomous entities that receive high-level requests ("run a game") and handle all implementation details themselves.

# Run the werewolf game
npx tsx packages/examples/werewolf-game.ts

Coding Team - Collaborative Development (coding-team/)

Demonstrates how specialized agents collaborate to implement software features:

Driver agent orchestrates the development process and tracks progress
Implementer agent writes production code following existing patterns
Test-writer agent creates comprehensive test suites
Shell tool integration enables running tests and type checking
TodoWrite tracking provides real-time progress visibility

This example shows the practical application of the pull architecture where each agent independently discovers what they need, rather than receiving massive context dumps.

# Set up the sample project
cd packages/examples/coding-team/sample-project && npm install && cd -

# Run the coding team
npx tsx packages/examples/coding-team.ts

💻 Command-Line Interface

The @nielspeter/agent-orchestration-cli package provides a production-ready CLI tool with dual modes:

Installation

# Install globally
npm install -g @nielspeter/agent-orchestration-cli

# Or use from workspace
npm run cli

Features

Dual Interface: CLI mode for terminal use, Web UI mode for browser interface
Unix-friendly: stdin/stdout support, proper exit codes, EPIPE handling
Security: 10MB input limit, 30s timeout, signal handling (SIGINT/SIGTERM)
Output modes: clean (default), verbose, json
Flexible: Use -p flag or pipe from stdin

Usage Examples

CLI Mode (Run agents from terminal):

# Basic usage
agent -p "Hello, world!"

# Read from stdin (Unix-style)
echo "Analyze this code" | agent
cat file.txt | agent

# JSON output for scripting
agent -p "List 3 colors" --json | jq '.result'

# Custom agent
agent -p "Review code" -a code-reviewer

# List available
agent --list-agents
agent --list-tools

Web UI Mode (Start server):

# Start web server
agent serve --open

# Custom port and host
agent serve --port 8080 --host 0.0.0.0

# Set working directory (agents, logs, file operations)
agent serve --working-dir ~/my-project --open

# Or use convenience script
npm run cli:serve

For complete CLI documentation, see packages/cli/README.md.

🎨 Agent Behavior Configuration

Agents can specify behavioral characteristics through presets that control temperature and top_p:

# In agent markdown frontmatter
---
name: validator
behavior: deterministic  # Uses preset for consistency
---

Available presets (catalog in providers-config.json, defaults in agent-config.json):

deterministic (0.1/0.5): Validation, routing, business logic
precise (0.2/0.6): Code analysis, verification, structured outputs
balanced (0.5/0.85): Default - orchestration, tool use, reasoning
creative (0.7/0.95): Storytelling, game mastering, creative content
exploratory (0.9/0.98): Research, brainstorming, alternatives

🧠 Extended Thinking & Reasoning

Agents can use extended thinking to reason deeply before responding - significantly improving performance on complex tasks like planning, code design, and problem-solving.

Quick Start

# In agent markdown frontmatter
---
name: orchestrator
tools: ["delegate", "todowrite"]
thinking:
  type: enabled
  budget_tokens: 16000
---

You are a project orchestrator. Before delegating tasks, think through:
- What is the end goal?
- What order makes sense?
- What could go wrong?

How It Works

When thinking is enabled, agents:

Think internally before responding (you see this process with 🧠 emoji)
Plan their approach step-by-step
Consider alternatives and edge cases
Generate better responses based on reasoning

Multi-Provider Support

The same configuration works across all providers:

Anthropic: Extended thinking (Claude 3.7) & Interleaved thinking (Claude 4+)
OpenRouter: Reasoning tokens (available on 200+ models)
OpenAI: Automatic reasoning (o1, o3 series)

Token Budget Guide

Task Complexity	Budget	Use Case
Simple	2,000-5,000	Basic analysis, routing
Moderate	5,000-10,000	Code implementation, planning
Complex	10,000-16,000	Multi-agent orchestration, code review
Very Complex	16,000-24,000	Deep analysis, complex problem solving

Example: Thinking in Action

🧠 Agent Thinking:
Let me analyze this request step by step:

1. The user wants to implement a factorial function
2. I need to consider edge cases: 0!, negative numbers
3. I should delegate to the implementer agent
4. The implementer will need the project path and requirements
5. After implementation, tests should verify correctness

Plan: First explore project structure, then delegate with clear
      requirements including edge case handling.

[Agent then executes the planned approach]

For complete documentation, see Extended Thinking Guide.

📁 Project Structure

agent-orchestration-system/
├── packages/                 # Workspace packages
│   ├── core/                # Core agent system (@nielspeter/agent-orchestration-core)
│   │   ├── src/             # Source code
│   │   │   ├── config/      # Configuration system
│   │   │   ├── middleware/  # Middleware pipeline
│   │   │   ├── agents/      # Agent domain
│   │   │   ├── tools/       # Tool domain
│   │   │   ├── providers/   # LLM providers
│   │   │   ├── logging/     # Logging
│   │   │   └── lib/         # Utilities
│   │   └── tests/           # Test suite
│   ├── cli/                 # CLI tool (@agent-system/cli)
│   │   ├── src/
│   │   │   ├── index.ts     # CLI entry point with stdin support
│   │   │   └── output.ts    # Output formatting utilities
│   │   ├── tests/           # CLI tests
│   │   └── README.md        # CLI documentation
│   ├── examples/            # Example scripts (@agent-system/examples)
│   │   ├── coding-team/     # Collaborative coding example
│   │   ├── thinking/        # Extended thinking demos
│   │   ├── udbud/           # Tender analysis example
│   │   └── *.ts             # Various example scripts
│   └── web/                 # Web UI (@agent-system/web)
│       ├── src/             # React frontend
│       └── server/          # Express backend
├── agents/                   # Shared agent definitions
└── docs/                     # Documentation

🏗️ Middleware Architecture Benefits

Clean Separation of Concerns

Each middleware ~60 lines (was 500+ in monolith)
Single responsibility per middleware
Easy to test, modify, and extend

Type Safety

Full TypeScript types throughout
No any types in critical paths
Compile-time safety

Error Resilience

Global error boundaries
Graceful degradation
User-friendly error messages

POC Stability

Fixed race conditions in pipeline
5-minute execution timeout
Proper concurrency handling

📊 Performance & Efficiency

Model Selection

Models must be specified with their provider prefix:

// Format: provider/model[:modifier]

// Direct to provider APIs
.withModel('anthropic/claude-haiku-4-5')
.withModel('openai/gpt-4-turbo')

// Via OpenRouter (supports :nitro and :floor modifiers)
.withModel('openrouter/meta-llama/llama-3.1-70b-instruct')        // Default routing
.withModel('openrouter/meta-llama/llama-3.1-70b-instruct:nitro')  // Fast throughput
.withModel('openrouter/meta-llama/llama-3.1-70b-instruct:floor')  // Lowest price

Caching Metrics

90% reduction in token costs for repeated context
2000x efficiency for multi-agent workflows
5-minute cache window perfect for interactive sessions

Execution Strategy

Parallel execution for read-only tools (up to 10 concurrent)
Sequential execution for write operations
Smart batching based on tool safety

🧪 Creating New Agents

Create a markdown file in agents/ directory:

---
name: my-specialist
tools: ["read", "list"]  # or "*" for all tools
---

# My Specialist Agent

You are a specialist agent that focuses on...
[Define the agent's role and capabilities]

⚙️ Configuration System

The new AgentSystemBuilder provides a fluent API for configuring the system:

import { AgentSystemBuilder } from './src/config/system-builder';

// Minimal configuration
const minimal = await AgentSystemBuilder.minimal().build();

// Default with file tools
const withTools = await AgentSystemBuilder.default()
  .withModel('anthropic/claude-haiku-4-5')
  .withSessionId('my-session')
  .build();

// Full configuration with MCP support
const full = await AgentSystemBuilder.default()
  .withMCPServers({
    'time': {
      command: 'uvx',
      args: ['mcp-server-time'],
      description: 'Time utilities'
    }
  })
  .withSafetyLimits({ maxIterations: 100 })
  .withLogging({ verbose: true })
  .build();

// From config file
const fromFile = await AgentSystemBuilder
  .fromConfigFile('./agent-config.json')
  .build();

// Always cleanup when done
await full.cleanup();

Code-First Configuration (No Files Required)

The system supports fully programmatic configuration, making config files optional. This is ideal for:

Testing: Inject controlled configuration without file dependencies
Secret Managers: Load API keys from AWS Secrets Manager, Vault, etc.
Library Usage: Embed the agent system in other applications
Dynamic Configuration: Build configuration at runtime

import { AgentSystemBuilder, type ProvidersConfig } from '@nielspeter/agent-orchestration-core';

// Define providers config programmatically
const providersConfig: ProvidersConfig = {
  providers: {
    anthropic: {
      type: 'native',
      apiKeyEnv: 'ANTHROPIC_API_KEY',
      models: [
        {
          id: 'claude-haiku-4-5',
          contextLength: 200000,
          maxOutputTokens: 8192,
        },
      ],
    },
    openrouter: {
      type: 'openai-compatible',
      baseURL: 'https://openrouter.ai/api/v1',
      apiKeyEnv: 'OPENROUTER_API_KEY',
    },
  },
  behaviorPresets: {
    balanced: { temperature: 0.5, top_p: 0.85 },
    precise: { temperature: 0.2, top_p: 0.6 },
  },
};

// Load API keys from your secret manager
const apiKeys = {
  ANTHROPIC_API_KEY: await secretManager.get('anthropic-api-key'),
  OPENROUTER_API_KEY: await secretManager.get('openrouter-api-key'),
};

// Build the system with programmatic configuration
const { executor, cleanup } = await AgentSystemBuilder.default()
  .withModel('anthropic/claude-haiku-4-5')
  .withProvidersConfig(providersConfig)
  .withAPIKeys(apiKeys)
  .build();

try {
  const result = await executor.execute('orchestrator', 'Your task here');
  console.log(result);
} finally {
  await cleanup();
}

Key Points:

No files needed: System works entirely from code
API key precedence: Programmatic keys override environment variables
Type safety: Full TypeScript support for configuration objects
Fallback behavior: Still falls back to process.env if keys not provided

Minimal Example (Testing):

// Minimal configuration for testing
const { executor, cleanup } = await AgentSystemBuilder.minimal()
  .withAPIKeys({
    ANTHROPIC_API_KEY: 'test-key',
  })
  .build();

🔧 Adding Custom Middleware

import { Middleware } from './middleware/middleware-types';

export function createCustomMiddleware(): Middleware {
  return async (ctx, next) => {
    // Pre-processing
    console.log(`Processing: ${ctx.agentName}`);
    
    // Call next middleware
    await next();
    
    // Post-processing
    console.log(`Completed: ${ctx.agentName}`);
  };
}

🎯 Key Design Decisions

Pull Architecture

Unlike traditional systems that pass full context to child agents, we implement a "pull, don't push" architecture:

Minimal Context: Child agents receive only the task prompt (~5-500 tokens)
Tool-Based Discovery: Agents use Read, Grep, List to gather what they need
No Confusion: No mixed contexts or role confusion
Cache Efficiency: Anthropic's cache makes "redundant" reads ~90% cheaper

// Traditional (problematic)
parentMessages: ctx.messages.slice() // 10,000+ tokens of confusion

// Our approach (pull architecture)
parentMessages: []  // Clean slate, agent pulls what it needs

Why Middleware?

Composable: Easy to add/remove/reorder functionality
Testable: Each piece can be tested in isolation
Maintainable: Clear boundaries and responsibilities
Familiar: Express.js-like pattern widely understood

Why Anthropic?

Caching is essential: Architecture depends on context reuse
OpenAI lacks caching: Would make delegation prohibitively expensive
Anthropic's ephemeral cache: Makes the architecture economically viable

🧪 Testing

The project includes comprehensive test coverage with separate unit and integration tests:

Unit Tests

npm run test:unit

No API calls required
Tests system structure and configuration
Fast execution (~1 second)
100% reliable

Integration Tests

npm run test:integration

Requires real API key (Anthropic or OpenRouter)
Tests actual agent orchestration
Tests caching behavior
Tests parallel execution
Note: May hit rate limits if run too frequently

Test Configuration

Create .env.test for test-specific settings:

ANTHROPIC_API_KEY=your-test-key
MODEL=claude-haiku-4-5
LOG_DIR=./test-logs
MAX_ITERATIONS=10
MAX_DEPTH=3

🔌 MCP (Model Context Protocol) Support

The system supports MCP servers for extending functionality with external tools:

Configuration

{
  "mcpServers": {
    "time": {
      "command": "uvx",
      "args": ["mcp-server-time"],
      "description": "Time and timezone utilities"
    },
    "weather": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-weather"],
      "description": "Weather information"
    }
  }
}

Usage

const builder = await AgentSystemBuilder
  .fromConfigFile('./agent-config.json')
  .build();

// MCP tools are automatically registered with server prefix
// e.g., "time.get_current_time", "weather.get_forecast"

📈 Example Workflow

User Request
  ↓
Middleware Pipeline
  ├─ Error Handler (catches all errors)
  ├─ Agent Loader (loads agent definition)
  ├─ Context Setup (prepares messages)
  ├─ Safety Checks (enforces limits)
  ├─ LLM Call (gets response)
  └─ Tool Execution
      ├─ Parallel batch (read operations)
      ├─ Sequential batch (write operations)
      └─ Delegation (recursive with context)

🔄 Pipeline Flow Diagrams

Middleware Pipeline Sequence

sequenceDiagram
    participant User
    participant AgentExecutor
    participant Pipeline
    participant ErrorHandler
    participant AgentLoader
    participant ContextSetup
    participant SafetyChecks
    participant LLMCall
    participant ToolExecution
    
    User->>AgentExecutor: execute(agent, prompt, context)
    AgentExecutor->>Pipeline: execute(middlewareContext)
    
    Note over Pipeline: Start middleware chain
    
    Pipeline->>ErrorHandler: middleware(ctx, next)
    activate ErrorHandler
    Note over ErrorHandler: Wrap in try-catch
    
    ErrorHandler->>AgentLoader: next()
    activate AgentLoader
    Note over AgentLoader: Load agent & filter tools
    
    AgentLoader->>ContextSetup: next()
    activate ContextSetup
    Note over ContextSetup: Setup messages & context
    
    ContextSetup->>SafetyChecks: next()
    activate SafetyChecks
    Note over SafetyChecks: Check limits & safety
    
    SafetyChecks->>LLMCall: next()
    activate LLMCall
    Note over LLMCall: Call Anthropic API
    
    LLMCall->>ToolExecution: next()
    activate ToolExecution
    Note over ToolExecution: Execute tool calls
    
    ToolExecution-->>LLMCall: return
    deactivate ToolExecution
    
    LLMCall-->>SafetyChecks: return
    deactivate LLMCall
    
    SafetyChecks-->>ContextSetup: return
    deactivate SafetyChecks
    
    ContextSetup-->>AgentLoader: return
    deactivate ContextSetup
    
    AgentLoader-->>ErrorHandler: return
    deactivate AgentLoader
    
    ErrorHandler-->>Pipeline: return (or handle error)
    deactivate ErrorHandler
    
    Pipeline-->>AgentExecutor: complete
    AgentExecutor-->>User: result

Detailed Middleware Flow

flowchart TD
    Start([User Request]) --> Executor[AgentExecutor.execute]
    
    Executor --> Context[Create MiddlewareContext]
    Context --> Loop{Iteration < maxIterations?}
    
    Loop -->|Yes| Pipeline[Pipeline.execute]
    Loop -->|No| Result[Return Result]
    
    Pipeline --> M1[ErrorHandlerMiddleware]
    M1 --> M1A{Try Block}
    M1A -->|Success| M2[AgentLoaderMiddleware]
    M1A -->|Error| M1B[Handle Error]
    M1B --> Result
    
    M2 --> M2A[Load Agent Definition]
    M2A --> M2B[Filter Tools by Permissions]
    M2B --> M3[ContextSetupMiddleware]
    
    M3 --> M3A[Setup Messages Array]
    M3A --> M3B[Add Parent Context if Exists]
    M3B --> M3C[Add System Prompt]
    M3C --> M4[SafetyChecksMiddleware]
    
    M4 --> M4A{Check Depth Limit}
    M4A -->|OK| M4B{Check Token Estimate}
    M4A -->|Exceeded| M4D[Set Error & Return]
    M4B -->|OK| M4C{Check Iteration Warning}
    M4B -->|Exceeded| M4D
    M4C -->|Warn| M4E[Log Warning]
    M4C -->|OK| M5[LLMCallMiddleware]
    M4E --> M5
    M4D --> Result
    
    M5 --> M5A[Call Anthropic API]
    M5A --> M5B{Has Tool Calls?}
    M5B -->|Yes| M6[ToolExecutionMiddleware]
    M5B -->|No| M5C[Set Result]
    M5C --> Check[Check shouldContinue]
    
    M6 --> M6A[Group Tools by Safety]
    M6A --> M6B[Execute Safe Tools in Parallel]
    M6B --> M6C[Execute Unsafe Tools Sequentially]
    M6C --> M6D{Has Delegate Tool?}
    M6D -->|Yes| M6E[Recursive Delegation]
    M6D -->|No| M6F[Add Results to Messages]
    M6E --> M6F
    M6F --> Check
    
    Check -->|Continue| Loop
    Check -->|Stop| Result
    
    Result --> End([Return to User])
    
    style M1 fill:#ffebee
    style M2 fill:#e3f2fd
    style M3 fill:#f3e5f5
    style M4 fill:#fff3e0
    style M5 fill:#e8f5e9
    style M6 fill:#e0f2f1

Tool Execution Strategy

flowchart LR
    subgraph "Tool Grouping"
        Tools[Tool Calls] --> Group{Group by Safety}
        Group --> Safe[Safe Tools<br/>Read, List, Grep]
        Group --> Unsafe[Unsafe Tools<br/>Write, Edit, Delegate]
    end
    
    subgraph "Execution"
        Safe --> Parallel[Parallel Execution<br/>Up to 10 concurrent]
        Unsafe --> Sequential[Sequential Execution<br/>One at a time]
        Sequential --> Delegate{Is Delegate Tool?}
        Delegate -->|Yes| Recursive[Recursive Agent Call<br/>Minimal context only]
        Delegate -->|No| Direct[Direct Execution]
    end
    
    Parallel --> Results[Collect Results]
    Direct --> Results
    Recursive --> Results
    
    Results --> Messages[Add to Message History]

🚦 Safety Features

Max depth: Prevents infinite delegation chains
Max iterations: Limits execution loops (default: 100)
Token estimation: Prevents context overflow
Execution timeout: 5-minute maximum per request
Error boundaries: Graceful error handling

📝 Testing

# Structure test (no API calls)
npm run example:structure

# Full orchestration test
npm run example:orchestration

# Parallel execution test
npm run example:parallel

# Caching demonstration
npm run example:cache

Name		Name	Last commit message	Last commit date
Latest commit History 228 Commits
.github/workflows		.github/workflows
docs		docs
packages		packages
.env.example		.env.example
.gitignore		.gitignore
.prettierignore		.prettierignore
CLAUDE.md		CLAUDE.md
README.md		README.md
agent-config.json.example		agent-config.json.example
eslint.config.mjs		eslint.config.mjs
package-lock.json		package-lock.json
package.json		package.json
prettier.config.mjs		prettier.config.mjs
providers-config.json		providers-config.json
providers-config.json.example		providers-config.json.example
providers-config.schema.json		providers-config.schema.json
tsconfig.json		tsconfig.json
vite.config.ts		vite.config.ts
vitest.config.ts		vitest.config.ts

nielspeter/agent-orchestration-system

Folders and files

Latest commit

History

Repository files navigation