TypeScript framework for building autonomous, collaborative AI agents
Key capabilities:
- Autonomous Agents: Agents gather information via tools, making independent decisions without massive context dumps
- Deep Reasoning: Multi-provider thinking support (Claude, OpenAI, OpenRouter) for complex planning and problem-solving
- Agent Collaboration: Agents delegate to specialized sub-agents, forming dynamic teams for complex tasks
- Multi-Provider Support: Switch between Anthropic, OpenAI, OpenRouter, or custom providers with simple configuration
- Production-Ready: Built-in security, retry logic, session persistence, and comprehensive monitoring
- Cost Efficient: Smart caching delivers up to 90% cost savings on multi-agent workflows
Install from npm (no authentication required):
# Install core library
npm install @nielspeter/agent-orchestration-core
# Install CLI globally
npm install -g @nielspeter/agent-orchestration-cli
# Or install both
npm install @nielspeter/agent-orchestration-core @nielspeter/agent-orchestration-cliPackage URLs:
- Core: https://www.npmjs.com/package/@nielspeter/agent-orchestration-core
- CLI: https://www.npmjs.com/package/@nielspeter/agent-orchestration-cli
type Middleware = (ctx: MiddlewareContext, next: () => Promise<void>) => Promise<void>;The monolithic 500-line AgentExecutor has been refactored into a clean pipeline of focused middleware:
- ErrorHandlerMiddleware - Global error boundary
- AgentLoaderMiddleware - Loads agents and filters tools
- ThinkingMiddleware - Validates and normalizes thinking configuration
- ContextSetupMiddleware - Manages conversation context
- ProviderSelectionMiddleware - Selects LLM provider (Anthropic, OpenRouter, etc.)
- SafetyChecksMiddleware - Enforces limits (depth, iterations, tokens)
- SmartRetryMiddleware - Retries on rate limits (429) with exponential backoff
- LLMCallMiddleware - Handles LLM communication
- ToolExecutionMiddleware - Orchestrates tool execution
- No special orchestrator class - all agents use the same pipeline
- Agents are defined as markdown files with YAML frontmatter
- Orchestration emerges through the
Delegatetool for delegation
When agent A delegates to agent B:
- B receives minimal context (~5-500 tokens) - just the task prompt
- B uses tools (Read, Write, List, Grep, Delegate) to pull information it needs
- Anthropic's cache makes "redundant" reads efficient (90% cost savings)
- Clean separation - each agent has independent context
Each agent automatically implements the Reason → Act → Observe loop:
- Reason: Agent analyzes prompt and decides what to do
- Act: Agent calls tools to gather information or take action
- Observe: Agent processes tool results
- Repeat: Continue until task is complete (no more tool calls)
This iterative refinement allows agents to:
- Build understanding incrementally
- Correct mistakes
- Ground responses in actual data
- Never hallucinate file contents
See Agentic Loop Pattern for details.
- Iteration: Same agent refining its response (limited by MAX_ITERATIONS)
- Delegation: Calling another agent via Delegate tool (limited by MAX_DEPTH)
# Install dependencies
npm install
# Set up API keys (at least one required)
cp .env.example .env
# Edit .env with your ANTHROPIC_API_KEY or OPENROUTER_API_KEY
# Optional: Configure providers
cp providers-config.example.json providers-config.json
# Build the project
npm run build
# Run tests
npm test # Run all tests
npm run test:unit # Unit tests only (no API)
npm run test:integration # Integration tests (requires API key)
# Use CLI
npm run cli -- -p "Hello, world!" # CLI tool
echo "Analyze this" | npm run cli # stdin support
# Run examples
npx tsx packages/examples/quickstart.ts # Simple quickstart
npx tsx packages/examples/orchestration.ts # Agent orchestration
npx tsx packages/examples/configuration.ts # Config file usage
npx tsx packages/examples/code-first-config.ts # Code-first configuration (no files)
npx tsx packages/examples/logging.ts # Logging features
npx tsx packages/examples/mcp-integration.ts # MCP server support
npx tsx packages/examples/werewolf-game.ts # Autonomous multi-agent game
npx tsx packages/examples/coding-team.ts # Collaborative coding agentsSimple demonstration of agent execution with file operations.
Shows how agents delegate tasks to specialized sub-agents using the Delegate tool.
Demonstrates loading agent system configuration from JSON files.
Shows programmatic configuration without config files. Includes 5 examples:
- Basic code-first configuration with
.withProvidersConfig()and.withAPIKeys() - Secret manager integration (simulated AWS Secrets Manager)
- Testing configuration (no file dependencies)
- Dynamic configuration based on runtime conditions
- API key precedence demonstration
npx tsx packages/examples/code-first-config.tsIdeal for testing, CI/CD, and production deployments where config files aren't practical.
A complex multi-agent game demonstrating true agent autonomy:
- Game-master agent orchestrates the entire game independently
- Role agents (werewolf, seer, villager) make strategic decisions
- Evidence-based gameplay with alibis, deductions, and voting
- No hardcoded logic - all game rules exist in agent prompts
This example showcases how agents can be truly autonomous entities that receive high-level requests ("run a game") and handle all implementation details themselves.
# Run the werewolf game
npx tsx packages/examples/werewolf-game.tsDemonstrates how specialized agents collaborate to implement software features:
- Driver agent orchestrates the development process and tracks progress
- Implementer agent writes production code following existing patterns
- Test-writer agent creates comprehensive test suites
- Shell tool integration enables running tests and type checking
- TodoWrite tracking provides real-time progress visibility
This example shows the practical application of the pull architecture where each agent independently discovers what they need, rather than receiving massive context dumps.
# Set up the sample project
cd packages/examples/coding-team/sample-project && npm install && cd -
# Run the coding team
npx tsx packages/examples/coding-team.tsThe @nielspeter/agent-orchestration-cli package provides a production-ready CLI tool with dual modes:
# Install globally
npm install -g @nielspeter/agent-orchestration-cli
# Or use from workspace
npm run cli- Dual Interface: CLI mode for terminal use, Web UI mode for browser interface
- Unix-friendly: stdin/stdout support, proper exit codes, EPIPE handling
- Security: 10MB input limit, 30s timeout, signal handling (SIGINT/SIGTERM)
- Output modes: clean (default), verbose, json
- Flexible: Use -p flag or pipe from stdin
CLI Mode (Run agents from terminal):
# Basic usage
agent -p "Hello, world!"
# Read from stdin (Unix-style)
echo "Analyze this code" | agent
cat file.txt | agent
# JSON output for scripting
agent -p "List 3 colors" --json | jq '.result'
# Custom agent
agent -p "Review code" -a code-reviewer
# List available
agent --list-agents
agent --list-toolsWeb UI Mode (Start server):
# Start web server
agent serve --open
# Custom port and host
agent serve --port 8080 --host 0.0.0.0
# Set working directory (agents, logs, file operations)
agent serve --working-dir ~/my-project --open
# Or use convenience script
npm run cli:serveFor complete CLI documentation, see packages/cli/README.md.
Agents can specify behavioral characteristics through presets that control temperature and top_p:
# In agent markdown frontmatter
---
name: validator
behavior: deterministic # Uses preset for consistency
---Available presets (catalog in providers-config.json, defaults in agent-config.json):
- deterministic (0.1/0.5): Validation, routing, business logic
- precise (0.2/0.6): Code analysis, verification, structured outputs
- balanced (0.5/0.85): Default - orchestration, tool use, reasoning
- creative (0.7/0.95): Storytelling, game mastering, creative content
- exploratory (0.9/0.98): Research, brainstorming, alternatives
Agents can use extended thinking to reason deeply before responding - significantly improving performance on complex tasks like planning, code design, and problem-solving.
# In agent markdown frontmatter
---
name: orchestrator
tools: ["delegate", "todowrite"]
thinking:
type: enabled
budget_tokens: 16000
---
You are a project orchestrator. Before delegating tasks, think through:
- What is the end goal?
- What order makes sense?
- What could go wrong?When thinking is enabled, agents:
- Think internally before responding (you see this process with 🧠 emoji)
- Plan their approach step-by-step
- Consider alternatives and edge cases
- Generate better responses based on reasoning
The same configuration works across all providers:
- Anthropic: Extended thinking (Claude 3.7) & Interleaved thinking (Claude 4+)
- OpenRouter: Reasoning tokens (available on 200+ models)
- OpenAI: Automatic reasoning (o1, o3 series)
| Task Complexity | Budget | Use Case |
|---|---|---|
| Simple | 2,000-5,000 | Basic analysis, routing |
| Moderate | 5,000-10,000 | Code implementation, planning |
| Complex | 10,000-16,000 | Multi-agent orchestration, code review |
| Very Complex | 16,000-24,000 | Deep analysis, complex problem solving |
🧠 Agent Thinking:
Let me analyze this request step by step:
1. The user wants to implement a factorial function
2. I need to consider edge cases: 0!, negative numbers
3. I should delegate to the implementer agent
4. The implementer will need the project path and requirements
5. After implementation, tests should verify correctness
Plan: First explore project structure, then delegate with clear
requirements including edge case handling.
[Agent then executes the planned approach]
For complete documentation, see Extended Thinking Guide.
agent-orchestration-system/
├── packages/ # Workspace packages
│ ├── core/ # Core agent system (@nielspeter/agent-orchestration-core)
│ │ ├── src/ # Source code
│ │ │ ├── config/ # Configuration system
│ │ │ ├── middleware/ # Middleware pipeline
│ │ │ ├── agents/ # Agent domain
│ │ │ ├── tools/ # Tool domain
│ │ │ ├── providers/ # LLM providers
│ │ │ ├── logging/ # Logging
│ │ │ └── lib/ # Utilities
│ │ └── tests/ # Test suite
│ ├── cli/ # CLI tool (@agent-system/cli)
│ │ ├── src/
│ │ │ ├── index.ts # CLI entry point with stdin support
│ │ │ └── output.ts # Output formatting utilities
│ │ ├── tests/ # CLI tests
│ │ └── README.md # CLI documentation
│ ├── examples/ # Example scripts (@agent-system/examples)
│ │ ├── coding-team/ # Collaborative coding example
│ │ ├── thinking/ # Extended thinking demos
│ │ ├── udbud/ # Tender analysis example
│ │ └── *.ts # Various example scripts
│ └── web/ # Web UI (@agent-system/web)
│ ├── src/ # React frontend
│ └── server/ # Express backend
├── agents/ # Shared agent definitions
└── docs/ # Documentation
- Each middleware ~60 lines (was 500+ in monolith)
- Single responsibility per middleware
- Easy to test, modify, and extend
- Full TypeScript types throughout
- No
anytypes in critical paths - Compile-time safety
- Global error boundaries
- Graceful degradation
- User-friendly error messages
- Fixed race conditions in pipeline
- 5-minute execution timeout
- Proper concurrency handling
Models must be specified with their provider prefix:
// Format: provider/model[:modifier]
// Direct to provider APIs
.withModel('anthropic/claude-haiku-4-5')
.withModel('openai/gpt-4-turbo')
// Via OpenRouter (supports :nitro and :floor modifiers)
.withModel('openrouter/meta-llama/llama-3.1-70b-instruct') // Default routing
.withModel('openrouter/meta-llama/llama-3.1-70b-instruct:nitro') // Fast throughput
.withModel('openrouter/meta-llama/llama-3.1-70b-instruct:floor') // Lowest price- 90% reduction in token costs for repeated context
- 2000x efficiency for multi-agent workflows
- 5-minute cache window perfect for interactive sessions
- Parallel execution for read-only tools (up to 10 concurrent)
- Sequential execution for write operations
- Smart batching based on tool safety
Create a markdown file in agents/ directory:
---
name: my-specialist
tools: ["read", "list"] # or "*" for all tools
---
# My Specialist Agent
You are a specialist agent that focuses on...
[Define the agent's role and capabilities]The new AgentSystemBuilder provides a fluent API for configuring the system:
import { AgentSystemBuilder } from './src/config/system-builder';
// Minimal configuration
const minimal = await AgentSystemBuilder.minimal().build();
// Default with file tools
const withTools = await AgentSystemBuilder.default()
.withModel('anthropic/claude-haiku-4-5')
.withSessionId('my-session')
.build();
// Full configuration with MCP support
const full = await AgentSystemBuilder.default()
.withMCPServers({
'time': {
command: 'uvx',
args: ['mcp-server-time'],
description: 'Time utilities'
}
})
.withSafetyLimits({ maxIterations: 100 })
.withLogging({ verbose: true })
.build();
// From config file
const fromFile = await AgentSystemBuilder
.fromConfigFile('./agent-config.json')
.build();
// Always cleanup when done
await full.cleanup();The system supports fully programmatic configuration, making config files optional. This is ideal for:
- Testing: Inject controlled configuration without file dependencies
- Secret Managers: Load API keys from AWS Secrets Manager, Vault, etc.
- Library Usage: Embed the agent system in other applications
- Dynamic Configuration: Build configuration at runtime
import { AgentSystemBuilder, type ProvidersConfig } from '@nielspeter/agent-orchestration-core';
// Define providers config programmatically
const providersConfig: ProvidersConfig = {
providers: {
anthropic: {
type: 'native',
apiKeyEnv: 'ANTHROPIC_API_KEY',
models: [
{
id: 'claude-haiku-4-5',
contextLength: 200000,
maxOutputTokens: 8192,
},
],
},
openrouter: {
type: 'openai-compatible',
baseURL: 'https://openrouter.ai/api/v1',
apiKeyEnv: 'OPENROUTER_API_KEY',
},
},
behaviorPresets: {
balanced: { temperature: 0.5, top_p: 0.85 },
precise: { temperature: 0.2, top_p: 0.6 },
},
};
// Load API keys from your secret manager
const apiKeys = {
ANTHROPIC_API_KEY: await secretManager.get('anthropic-api-key'),
OPENROUTER_API_KEY: await secretManager.get('openrouter-api-key'),
};
// Build the system with programmatic configuration
const { executor, cleanup } = await AgentSystemBuilder.default()
.withModel('anthropic/claude-haiku-4-5')
.withProvidersConfig(providersConfig)
.withAPIKeys(apiKeys)
.build();
try {
const result = await executor.execute('orchestrator', 'Your task here');
console.log(result);
} finally {
await cleanup();
}Key Points:
- No files needed: System works entirely from code
- API key precedence: Programmatic keys override environment variables
- Type safety: Full TypeScript support for configuration objects
- Fallback behavior: Still falls back to
process.envif keys not provided
Minimal Example (Testing):
// Minimal configuration for testing
const { executor, cleanup } = await AgentSystemBuilder.minimal()
.withAPIKeys({
ANTHROPIC_API_KEY: 'test-key',
})
.build();import { Middleware } from './middleware/middleware-types';
export function createCustomMiddleware(): Middleware {
return async (ctx, next) => {
// Pre-processing
console.log(`Processing: ${ctx.agentName}`);
// Call next middleware
await next();
// Post-processing
console.log(`Completed: ${ctx.agentName}`);
};
}Unlike traditional systems that pass full context to child agents, we implement a "pull, don't push" architecture:
- Minimal Context: Child agents receive only the task prompt (~5-500 tokens)
- Tool-Based Discovery: Agents use Read, Grep, List to gather what they need
- No Confusion: No mixed contexts or role confusion
- Cache Efficiency: Anthropic's cache makes "redundant" reads ~90% cheaper
// Traditional (problematic)
parentMessages: ctx.messages.slice() // 10,000+ tokens of confusion
// Our approach (pull architecture)
parentMessages: [] // Clean slate, agent pulls what it needs- Composable: Easy to add/remove/reorder functionality
- Testable: Each piece can be tested in isolation
- Maintainable: Clear boundaries and responsibilities
- Familiar: Express.js-like pattern widely understood
- Caching is essential: Architecture depends on context reuse
- OpenAI lacks caching: Would make delegation prohibitively expensive
- Anthropic's ephemeral cache: Makes the architecture economically viable
The project includes comprehensive test coverage with separate unit and integration tests:
npm run test:unit- No API calls required
- Tests system structure and configuration
- Fast execution (~1 second)
- 100% reliable
npm run test:integration- Requires real API key (Anthropic or OpenRouter)
- Tests actual agent orchestration
- Tests caching behavior
- Tests parallel execution
- Note: May hit rate limits if run too frequently
Create .env.test for test-specific settings:
ANTHROPIC_API_KEY=your-test-key
MODEL=claude-haiku-4-5
LOG_DIR=./test-logs
MAX_ITERATIONS=10
MAX_DEPTH=3The system supports MCP servers for extending functionality with external tools:
{
"mcpServers": {
"time": {
"command": "uvx",
"args": ["mcp-server-time"],
"description": "Time and timezone utilities"
},
"weather": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-weather"],
"description": "Weather information"
}
}
}const builder = await AgentSystemBuilder
.fromConfigFile('./agent-config.json')
.build();
// MCP tools are automatically registered with server prefix
// e.g., "time.get_current_time", "weather.get_forecast"User Request
↓
Middleware Pipeline
├─ Error Handler (catches all errors)
├─ Agent Loader (loads agent definition)
├─ Context Setup (prepares messages)
├─ Safety Checks (enforces limits)
├─ LLM Call (gets response)
└─ Tool Execution
├─ Parallel batch (read operations)
├─ Sequential batch (write operations)
└─ Delegation (recursive with context)
sequenceDiagram
participant User
participant AgentExecutor
participant Pipeline
participant ErrorHandler
participant AgentLoader
participant ContextSetup
participant SafetyChecks
participant LLMCall
participant ToolExecution
User->>AgentExecutor: execute(agent, prompt, context)
AgentExecutor->>Pipeline: execute(middlewareContext)
Note over Pipeline: Start middleware chain
Pipeline->>ErrorHandler: middleware(ctx, next)
activate ErrorHandler
Note over ErrorHandler: Wrap in try-catch
ErrorHandler->>AgentLoader: next()
activate AgentLoader
Note over AgentLoader: Load agent & filter tools
AgentLoader->>ContextSetup: next()
activate ContextSetup
Note over ContextSetup: Setup messages & context
ContextSetup->>SafetyChecks: next()
activate SafetyChecks
Note over SafetyChecks: Check limits & safety
SafetyChecks->>LLMCall: next()
activate LLMCall
Note over LLMCall: Call Anthropic API
LLMCall->>ToolExecution: next()
activate ToolExecution
Note over ToolExecution: Execute tool calls
ToolExecution-->>LLMCall: return
deactivate ToolExecution
LLMCall-->>SafetyChecks: return
deactivate LLMCall
SafetyChecks-->>ContextSetup: return
deactivate SafetyChecks
ContextSetup-->>AgentLoader: return
deactivate ContextSetup
AgentLoader-->>ErrorHandler: return
deactivate AgentLoader
ErrorHandler-->>Pipeline: return (or handle error)
deactivate ErrorHandler
Pipeline-->>AgentExecutor: complete
AgentExecutor-->>User: result
flowchart TD
Start([User Request]) --> Executor[AgentExecutor.execute]
Executor --> Context[Create MiddlewareContext]
Context --> Loop{Iteration < maxIterations?}
Loop -->|Yes| Pipeline[Pipeline.execute]
Loop -->|No| Result[Return Result]
Pipeline --> M1[ErrorHandlerMiddleware]
M1 --> M1A{Try Block}
M1A -->|Success| M2[AgentLoaderMiddleware]
M1A -->|Error| M1B[Handle Error]
M1B --> Result
M2 --> M2A[Load Agent Definition]
M2A --> M2B[Filter Tools by Permissions]
M2B --> M3[ContextSetupMiddleware]
M3 --> M3A[Setup Messages Array]
M3A --> M3B[Add Parent Context if Exists]
M3B --> M3C[Add System Prompt]
M3C --> M4[SafetyChecksMiddleware]
M4 --> M4A{Check Depth Limit}
M4A -->|OK| M4B{Check Token Estimate}
M4A -->|Exceeded| M4D[Set Error & Return]
M4B -->|OK| M4C{Check Iteration Warning}
M4B -->|Exceeded| M4D
M4C -->|Warn| M4E[Log Warning]
M4C -->|OK| M5[LLMCallMiddleware]
M4E --> M5
M4D --> Result
M5 --> M5A[Call Anthropic API]
M5A --> M5B{Has Tool Calls?}
M5B -->|Yes| M6[ToolExecutionMiddleware]
M5B -->|No| M5C[Set Result]
M5C --> Check[Check shouldContinue]
M6 --> M6A[Group Tools by Safety]
M6A --> M6B[Execute Safe Tools in Parallel]
M6B --> M6C[Execute Unsafe Tools Sequentially]
M6C --> M6D{Has Delegate Tool?}
M6D -->|Yes| M6E[Recursive Delegation]
M6D -->|No| M6F[Add Results to Messages]
M6E --> M6F
M6F --> Check
Check -->|Continue| Loop
Check -->|Stop| Result
Result --> End([Return to User])
style M1 fill:#ffebee
style M2 fill:#e3f2fd
style M3 fill:#f3e5f5
style M4 fill:#fff3e0
style M5 fill:#e8f5e9
style M6 fill:#e0f2f1
flowchart LR
subgraph "Tool Grouping"
Tools[Tool Calls] --> Group{Group by Safety}
Group --> Safe[Safe Tools<br/>Read, List, Grep]
Group --> Unsafe[Unsafe Tools<br/>Write, Edit, Delegate]
end
subgraph "Execution"
Safe --> Parallel[Parallel Execution<br/>Up to 10 concurrent]
Unsafe --> Sequential[Sequential Execution<br/>One at a time]
Sequential --> Delegate{Is Delegate Tool?}
Delegate -->|Yes| Recursive[Recursive Agent Call<br/>Minimal context only]
Delegate -->|No| Direct[Direct Execution]
end
Parallel --> Results[Collect Results]
Direct --> Results
Recursive --> Results
Results --> Messages[Add to Message History]
- Max depth: Prevents infinite delegation chains
- Max iterations: Limits execution loops (default: 100)
- Token estimation: Prevents context overflow
- Execution timeout: 5-minute maximum per request
- Error boundaries: Graceful error handling
# Structure test (no API calls)
npm run example:structure
# Full orchestration test
npm run example:orchestration
# Parallel execution test
npm run example:parallel
# Caching demonstration
npm run example:cache