An intelligent reinforcement learning system that automates and optimizes advertising campaigns for e-commerce t-shirt businesses across multiple platforms, maximizing profitability through continuous learning.
Note: The code has been modularized into separate files for clarity and maintainability. See the Project Structure section below for details and updated commands.
- Overview
- Key Features
- Architecture
- Quick Start
- Installation
- Usage
- Configuration
- API Reference
- How It Works
- Performance Metrics
- Examples
- Testing
- Deployment
- Troubleshooting
- Roadmap
- Contributing
- License
- Acknowledgments
This repo uses a modular TypeScript structure:
src/
agent/
base.ts # RLAgent abstract base
dqnAgent.ts # DQNAgent implementation
environment/
simulator.ts # Ad environment simulator
observers/
types.ts # TrainingObserver interface
consoleLogger.ts # Console logger observer
metricsCollector.ts# Metrics collector observer
platforms/
base.ts # AdPlatformAPI abstract base
factory.ts # Platform factory
mockTikTok.ts # TikTok mock API
mockInstagram.ts # Instagram mock API
types.ts # Shared types (state/actions/metrics)
index.ts # Barrel exports
main.ts # CLI/entry point (training demo)
Build and run commands:
- Start (dev):
npm start
(ts-nodesrc/main.ts
) - Build:
npm run build
(outputs todist/
) - Start (prod):
npm run start:prod
(builds then runsnode dist/main.js
)
This project implements a Deep Q-Learning (DQN) agent that learns to optimize advertising campaigns for t-shirt businesses across TikTok and Instagram. By continuously learning from campaign performance data, the system automatically adjusts budgets, targeting parameters, creative strategies, and platform allocation to maximize profit.
Episode 1 | Total Reward: -2.34 | Profit: -$234
Episode 50 | Total Reward: 8.92 | Profit: $892
β 380% improvement through learning!
- API Integration Spec:
docs/api_spec.md
- Production Integration Guide:
docs/real_integration.md
- Low-Spend Rollout Guide:
docs/low_spend_rollout.md
- PoC Launch Checklist:
docs/poc_checklist.md
- Mathematical Primer:
docs/math_primer.md
- Torch.js DQN Refactor Tutorial:
docs/torchjs_dqn_refactor.md
If you are migrating from the current tabular approach in src/agent/dqnAgent.ts
to a true DQN, start here:
- Replace Q-table with a Q-network (state β Q-values).
- Encode states and index actions consistently.
- Add replay buffer and target network.
- Train with TD targets and MSE loss.
- Schedule Ξ΅ and LR; persist and evaluate.
See docs/torchjs_dqn_refactor.md
for a concise, step-by-step guide.
- Tabular baseline (Q-table):
npm start
- DQN neural agent (Torch.js-style over TF.js):
npm start -- --agent=nn
- Useful flags:
--episodes
,--batchSize
,--gamma
,--lr
,--trainFreq
,--targetSync
,--replayCap
,--epsilonStart
,--epsilonMin
,--epsilonDecay
Example:
npm start -- \
--agent=nn \
--episodes=200 \
--batchSize=64 \
--gamma=0.97 \
--lr=0.0005 \
--targetSync=500 \
--replayCap=20000
Notes
- Backend: uses
@tensorflow/tfjs
by default for portability; consider@tensorflow/tfjs-node
for faster training. - Encoding/Actions: see
src/agent/encoding.ts
for deterministic feature mapping and the action grid.
Run the real-runner skeleton in shadow mode with a strict $30/day cap and peak hours (adapters to be implemented before going live):
npm run run:real -- \
--mode=shadow \
--daily-budget-target=30 \
--peak-hours=18-22 \
--delta-max=0.10 \
--lambda-spend=0.25 \
--lagrange-step=0.05 \
--canary-list="tiktok:ADSET_ID,instagram:ADSET_ID"
Then review logs and the PoC checklist before enabling --mode=pilot
.
Example of a tidy console panel you can print during training:
βββββββββββββββββββββββββββββββββββββββββββββββ
β Training Progress (847/1000) β
βββββββββββββββββββββββββββββββββββββββββββββββ€
β Progress: βββββββββββββββββββββββββ 84.7% β
β Current Reward: 7.23 β
β Avg Reward (last 100): 6.85 β
β Best Reward: 9.42 β
β Epsilon: 0.03 β
β Learning Rate: 0.001 β
β Platform: TikTok 62% | IG 38% β
β Top Creative: UGC (34%) β
β Top Age Group: 18-24 (41%) β
βββββββββββββββββββββββββββββββββββββββββββββββ
Traditional rule-based ad optimization fails to capture complex, non-linear relationships between:
- Temporal patterns (day/hour performance variations)
- Platform dynamics (TikTok vs Instagram audiences)
- Creative fatigue (performance decay over time)
- Competitive landscapes (bidding wars, market saturation)
RL agents discover optimal strategies through exploration and exploitation, continuously adapting to market changes.
- Self-learning optimization without manual rules
- Multi-platform orchestration across TikTok, Instagram, and Shopify
- 24/7 autonomous operation with safety guardrails
- Deep Q-Network (DQN) with experience replay
- Real-time adaptation to market conditions
- A/B testing integration for policy validation
- Multi-objective optimization (profit, ROAS, CPA)
- SOLID principles and Gang of Four patterns
- Modular design for easy platform additions
- Observable training with metrics collection
- Production-ready logging and monitoring
- TypeScript for type safety
- Mock APIs for development/testing
- Comprehensive testing suite
- Detailed documentation and examples
graph TB
subgraph "RL Agent Core"
A[DQN Agent] --> B[Q-Network]
A --> C[Experience Replay]
A --> D[Action Selection]
end
subgraph "Environment"
E[Ad Environment Simulator]
E --> F[State Manager]
E --> G[Reward Calculator]
end
subgraph "Platform APIs"
H[TikTok API]
I[Instagram API]
J[Shopify API]
end
subgraph "Training Pipeline"
K[Training Controller]
K --> L[Episode Manager]
K --> M[Metrics Collector]
end
A <--> E
E <--> H
E <--> I
E <--> J
K --> A
K --> E
M --> N[Monitoring Dashboard]
src/
βββ core/
β βββ interfaces/ # TypeScript interfaces
β β βββ IAdEnvironment.ts
β β βββ IAgent.ts
β β βββ IPlatformAPI.ts
β βββ agents/ # RL Agent implementations
β β βββ DQNAgent.ts # Deep Q-Learning
β β βββ PPOAgent.ts # Proximal Policy Optimization
β β βββ A2CAgent.ts # Advantage Actor-Critic
β βββ environment/ # Environment logic
β βββ AdEnvironment.ts
β βββ StateManager.ts
β βββ RewardCalculator.ts
βββ platforms/ # Platform integrations
β βββ tiktok/
β β βββ TikTokAPI.ts
β β βββ TikTokSimulator.ts
β βββ instagram/
β β βββ InstagramAPI.ts
β β βββ InstagramSimulator.ts
β βββ factory/
β βββ PlatformFactory.ts
βββ training/ # Training pipeline
β βββ TrainingPipeline.ts
β βββ observers/
β β βββ ConsoleLogger.ts
β β βββ MetricsCollector.ts
β β βββ TensorBoard.ts
β βββ replay/
β βββ ExperienceReplay.ts
βββ utils/ # Utilities
β βββ config/
β β βββ Configuration.ts
β βββ logging/
β β βββ Logger.ts
β βββ metrics/
β βββ MetricsCalculator.ts
βββ main.ts # Entry point
Pattern | Implementation | Purpose |
---|---|---|
Strategy | RLAgent base class |
Swap RL algorithms (DQNβPPO) |
Factory | PlatformFactory |
Create platform-specific APIs |
Observer | TrainingObserver |
Monitor training progress |
Adapter | EnvironmentSimulator |
Unified platform interface |
Command | TrainingPipeline |
Encapsulate training operations |
Singleton | Configuration |
Global settings management |
Template Method | BaseAgent.train() |
Standardize training loop |
# Clone the repository
git clone https://github.com/yourusername/rl-tshirt-ads.git
cd rl-tshirt-ads
# Install dependencies
npm install
# Run with default configuration
npm start
# Watch training progress
tail -f logs/training.log
- Node.js 18.0+ (Download)
- TypeScript 4.9+ (
npm install -g typescript
) - Git 2.0+
-
Clone the repository
git clone https://github.com/yourusername/rl-tshirt-ads.git cd rl-tshirt-ads
-
Install dependencies
npm install
-
Configure environment
cp .env.example .env # Edit .env with your settings
-
Build the project
npm run build
-
Run tests
npm test
-
Start training
npm run train
# Dockerfile
FROM node:18-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build
CMD ["npm", "start"]
# Build and run with Docker
docker build -t rl-tshirt-ads .
docker run -it --rm rl-tshirt-ads
import { DQNAgent, AdEnvironmentSimulator, TrainingPipeline } from "./src";
// Initialize components
const agent = new DQNAgent({
learningRate: 0.001,
discountFactor: 0.95,
epsilon: 1.0,
epsilonDecay: 0.995,
});
const environment = new AdEnvironmentSimulator({
platforms: ["tiktok", "instagram"],
initialBudget: 500,
episodeLength: 24, // hours
});
const pipeline = new TrainingPipeline(agent, environment);
// Train the agent
await pipeline.train({
episodes: 1000,
saveInterval: 100,
logInterval: 10,
});
// Custom reward shaping
const customRewardCalculator = new RewardCalculator({
profitWeight: 0.7,
roasWeight: 0.2,
conversionWeight: 0.1,
penalizeBudgetOverspend: true,
});
// Multi-objective optimization
const agent = new DQNAgent({
rewardCalculator: customRewardCalculator,
actionSpace: {
budgetRange: [0.5, 2.0],
platforms: ["tiktok", "instagram", "facebook"],
creativeTypes: ["lifestyle", "product", "ugc", "discount"],
ageGroups: ["18-24", "25-34", "35-44", "45+"],
},
});
// Add custom observers
pipeline.addObserver(new TensorBoardLogger());
pipeline.addObserver(
new SlackNotifier({
webhookUrl: process.env.SLACK_WEBHOOK,
notifyOn: ["episode_complete", "milestone_reached"],
})
);
// Load pre-trained model
const agent = new DQNAgent();
await agent.load("./models/production_model.json");
// Set to exploitation mode (no exploration)
agent.setEpsilon(0);
// Run in production with safety constraints
const productionEnv = new AdEnvironmentSimulator({
mode: "production",
constraints: {
maxDailyBudget: 10000,
minROAS: 1.5,
maxBudgetChangePercent: 30,
},
});
// Execute optimizations
const controller = new ProductionController(agent, productionEnv);
await controller.run({
interval: "1h", // Run every hour
dryRun: false, // Apply changes to real campaigns
monitoring: true, // Enable performance monitoring
});
# Training Configuration
EPISODES=1000
BATCH_SIZE=32
LEARNING_RATE=0.001
DISCOUNT_FACTOR=0.95
EPSILON_START=1.0
EPSILON_DECAY=0.995
EPSILON_MIN=0.01
# Platform Configuration
TIKTOK_API_KEY=mock_key_123
INSTAGRAM_API_KEY=mock_key_456
SHOPIFY_API_KEY=mock_key_789
# Monitoring
ENABLE_TENSORBOARD=true
TENSORBOARD_PORT=6006
LOG_LEVEL=info
METRICS_EXPORT_PATH=./metrics
# Safety Constraints
MAX_DAILY_BUDGET=10000
MIN_ROAS_THRESHOLD=1.0
MAX_BUDGET_CHANGE_PERCENT=50
# Real-world Constraints
# Pricing and costs
TSHIRT_PRICE=29.99 # or PRODUCT_PRICE
PRINTFUL_COGS=15.00 # or COGS_PER_UNIT
# Platform availability
ALLOWED_PLATFORMS=tiktok # comma-separated: e.g., "tiktok,instagram"
DISABLE_INSTAGRAM=true # optional convenience flag
# Creative constraints
LOCKED_CREATIVE_TYPE=ugc # lock to a single creative type
# Budgeting
DAILY_BUDGET_TARGET=30 # shapes hourly spend penalty in reward
These environment variables adjust the simulator to better reflect real operating constraints:
- TSHIRT_PRICE/PRODUCT_PRICE: Revenue per unit sold.
- PRINTFUL_COGS/COGS_PER_UNIT: Cost of goods per unit (used to compute net profit).
- ALLOWED_PLATFORMS or DISABLE_INSTAGRAM: Restrict simulator to platforms you can actually run.
- LOCKED_CREATIVE_TYPE: Force a single creative type when you only have one asset.
- DAILY_BUDGET_TARGET: Sets the hourly cap used for overspend penalties in reward shaping.
This repo now includes scaffolding to run a shadow-mode loop that composes real TikTok ad spend and Shopify revenue while never writing changes back to platforms.
- Real Shopify data source:
src/datasources/shopify.ts
(stubbed) - Real TikTok API adapter:
src/platforms/realTikTok.ts
(stubbed) - Real shadow environment:
src/environment/realShadow.ts
- Runner:
src/run/shadowTraining.ts
Usage:
- Set env vars for constraints and credentials (if wiring real APIs):
PRINTFUL_COGS=15
,TSHIRT_PRICE=29.99
,DAILY_BUDGET_TARGET=30
SHOPIFY_API_KEY=...
,SHOPIFY_STORE_DOMAIN=...
TIKTOK_API_KEY=...
- Run:
npm run build && node dist/run/shadowTraining.js --episodes=50
Notes:
- The stubs return zero metrics by default; replace TODOs with real HTTP calls.
- Reward shaping is margin-based ROAS:
(revenue - COGS) / adSpend
thresholds drive bonuses, not gross ROAS.
{
"agent": {
"type": "DQN",
"network": {
"hidden_layers": [128, 64, 32],
"activation": "relu",
"optimizer": "adam"
},
"memory": {
"capacity": 10000,
"batch_size": 32
}
},
"environment": {
"state_space": {
"dimensions": 15,
"normalization": true
},
"action_space": {
"type": "discrete",
"size": 120
},
"reward": {
"type": "profit",
"normalization_factor": 1000
}
},
"training": {
"episodes": 1000,
"max_steps_per_episode": 24,
"save_interval": 100,
"evaluation_interval": 50
}
}
class DQNAgent extends RLAgent {
constructor(config?: AgentConfig);
// Core methods
selectAction(state: AdEnvironmentState): AdAction;
update(
state: AdEnvironmentState,
action: AdAction,
reward: number,
nextState: AdEnvironmentState
): void;
// Model persistence
save(filepath: string): Promise<void>;
load(filepath: string): Promise<void>;
// Configuration
setEpsilon(value: number): void;
setLearningRate(value: number): void;
}
class AdEnvironmentSimulator {
constructor(config?: EnvironmentConfig);
// Environment control
reset(): AdEnvironmentState;
step(action: AdAction): [AdEnvironmentState, number, boolean];
// State management
getCurrentState(): AdEnvironmentState;
setState(state: AdEnvironmentState): void;
// Platform management
addPlatform(name: string, api: AdPlatformAPI): void;
removePlatform(name: string): void;
}
class TrainingPipeline {
constructor(agent: RLAgent, environment: AdEnvironmentSimulator);
// Training control
train(config: TrainingConfig): Promise<TrainingResults>;
pause(): void;
resume(): void;
stop(): void;
// Observation
addObserver(observer: TrainingObserver): void;
removeObserver(observer: TrainingObserver): void;
// Metrics
getMetrics(): TrainingMetrics;
exportMetrics(filepath: string): Promise<void>;
}
interface AdEnvironmentState {
// Temporal features
dayOfWeek: number; // 0-6
hourOfDay: number; // 0-23
// Campaign parameters
currentBudget: number;
targetAgeGroup: string;
targetInterests: string[];
creativeType: string;
platform: string;
// Performance metrics
historicalCTR: number;
historicalCVR: number;
// Market conditions
competitorActivity: number; // 0-1
seasonality: number; // 0-1
}
interface AdAction {
budgetAdjustment: number; // Multiplier (0.5-2.0)
targetAgeGroup: string;
targetInterests: string[];
creativeType: string;
bidStrategy: "CPC" | "CPM" | "CPA";
platform: "tiktok" | "instagram" | "shopify";
}
The agent observes the current state of all ad campaigns:
- Temporal context: Day of week, hour of day
- Campaign settings: Budget, targeting, creative
- Performance history: CTR, CVR, ROAS
- Market dynamics: Competition, seasonality
Using Ξ΅-greedy strategy:
- Exploration (Ξ΅): Try random actions to discover new strategies
- Exploitation (1-Ξ΅): Choose the best-known action for the current state
The environment simulates campaign performance:
Action β API Call β Performance Metrics β Reward Signal
Q-learning formula:
Q(s,a) β Q(s,a) + Ξ±[r + Ξ³ max Q(s',a') - Q(s,a)]
Where:
Q(s,a)
: Expected value of actiona
in states
Ξ±
: Learning rater
: Immediate rewardΞ³
: Discount factors'
: Next state
Store experiences and learn from random batches:
- Breaks correlation between sequential experiences
- Improves sample efficiency
- Stabilizes learning
Metric | Description | Target |
---|---|---|
Average Episode Reward | Mean reward over last 100 episodes | > 5.0 |
Convergence Rate | Episodes to stable performance | < 500 |
Exploration Efficiency | Unique state-actions discovered | > 80% |
Learning Stability | Reward variance over time | < 0.5 |
Metric | Formula | Target |
---|---|---|
Profit | Revenue - Ad Spend | Maximize |
ROAS | Revenue / Ad Spend | > 3.0 |
CPA | Ad Spend / Conversions | < $15 |
CTR | Clicks / Impressions | > 2% |
CVR | Conversions / Clicks | > 3% |
βββββββββββββββββββββββββββββββββββββββββββ
β Training Progress β
βββββββββββββββββββββββββββββββββββββββββββ€
β Episode: 847/1000 β
β ββββββββββββββββββββββββ 84.7% β
β β
β Current Reward: 7.23 β
β Avg Reward (100 ep): 6.85 β
β Best Reward: 9.42 β
β β
β Epsilon: 0.03 β
β Learning Rate: 0.001 β
β β
β Platform Distribution: β
β TikTok: 62% β
β Instagram: 38% β
β β
β Top Creative: UGC (34%) β
β Top Age Group: 18-24 (41%) β
βββββββββββββββββββββββββββββββββββββββββββ
// train.ts
import { createDefaultPipeline } from "./src/factory";
async function main() {
// Create pipeline with defaults
const pipeline = createDefaultPipeline();
// Train for 100 episodes
const results = await pipeline.train({ episodes: 100 });
// Print results
console.log("Training Complete!");
console.log(`Final Avg Reward: ${results.avgReward}`);
console.log(`Best Episode: ${results.bestEpisode}`);
console.log(`Total Profit: $${results.totalProfit}`);
}
main().catch(console.error);
// custom-platform.ts
import { AdPlatformAPI, AdPlatformFactory } from "./src/platforms";
class CustomPlatformAPI extends AdPlatformAPI {
async updateCampaign(campaignId: string, params: any): Promise<any> {
// Your custom API logic
return { success: true };
}
simulatePerformance(
state: AdEnvironmentState,
action: AdAction
): RewardMetrics {
// Custom performance simulation
return {
revenue: Math.random() * 1000,
adSpend: action.budgetAdjustment * state.currentBudget,
profit: 0,
roas: 0,
conversions: 0,
};
}
}
// Register the new platform
AdPlatformFactory.registerPlatform("custom", new CustomPlatformAPI());
// realtime.ts
import { ProductionController } from "./src/production";
const controller = new ProductionController({
modelPath: "./models/trained_model.json",
platforms: ["tiktok", "instagram"],
updateInterval: "30m",
dryRun: false,
});
// Start real-time optimization
controller.start();
// Monitor performance
controller.on("optimization", (result) => {
console.log(`Optimization at ${new Date()}`);
console.log(`Platform: ${result.platform}`);
console.log(`Budget Change: ${result.budgetChange}%`);
console.log(`Expected Profit: $${result.expectedProfit}`);
});
npm test
# Unit tests
npm run test:unit
# Integration tests
npm run test:integration
# E2E tests
npm run test:e2e
# Performance tests
npm run test:performance
npm run test:coverage
// tests/agent.test.ts
import { DQNAgent } from "../src/agents/DQNAgent";
describe("DQNAgent", () => {
let agent: DQNAgent;
beforeEach(() => {
agent = new DQNAgent({ epsilon: 0.5 });
});
test("should select random action during exploration", () => {
const state = createMockState();
const actions = new Set();
// Collect 100 actions
for (let i = 0; i < 100; i++) {
const action = agent.selectAction(state);
actions.add(JSON.stringify(action));
}
// Should have multiple different actions
expect(actions.size).toBeGreaterThan(1);
});
test("should improve performance through learning", () => {
const env = createMockEnvironment();
const initialReward = evaluateAgent(agent, env);
// Train for 100 episodes
trainAgent(agent, env, 100);
const finalReward = evaluateAgent(agent, env);
expect(finalReward).toBeGreaterThan(initialReward);
});
});
npm run dev
npm run deploy:staging
# Install PM2
npm install -g pm2
# Start application
pm2 start ecosystem.config.js --env production
# Monitor
pm2 monit
# Logs
pm2 logs
# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: rl-tshirt-ads
spec:
replicas: 3
selector:
matchLabels:
app: rl-tshirt-ads
template:
metadata:
labels:
app: rl-tshirt-ads
spec:
containers:
- name: app
image: your-registry/rl-tshirt-ads:latest
ports:
- containerPort: 3000
env:
- name: NODE_ENV
value: "production"
resources:
requests:
memory: "512Mi"
cpu: "500m"
limits:
memory: "1Gi"
cpu: "1000m"
Solution:
- Decrease learning rate:
agent.setLearningRate(0.0001)
- Increase exploration:
agent.setEpsilon(1.0)
- Check reward normalization
Solution:
- Reduce replay buffer size
- Decrease batch size
- Enable memory profiling:
NODE_OPTIONS="--max-old-space-size=4096"
Solution:
- Platform-specific reward shaping
- Increase training episodes for that platform
- Check platform API simulator accuracy
# Enable verbose logging
DEBUG=* npm start
# Profile memory usage
npm run profile:memory
# Analyze performance
npm run profile:cpu
- β Basic DQN implementation
- β Mock platform APIs
- β Training pipeline
- β Metrics collection
- β¬ Proximal Policy Optimization (PPO)
- β¬ Multi-agent competition
- β¬ Continuous action spaces
- β¬ Hierarchical RL for campaign strategies
- β¬ Real API integrations
- β¬ A/B testing framework
- β¬ AutoML for hyperparameter tuning
- β¬ Real-time streaming data pipeline
- β¬ Distributed training (Ray/RLlib)
- β¬ Transfer learning between businesses
- β¬ Natural language strategy descriptions
- β¬ Automated creative generation
- β¬ Google Ads integration
- β¬ Amazon Advertising
- β¬ LinkedIn Ads
- β¬ Cross-platform budget optimization
We welcome contributions! Please see our Contributing Guide for details.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature
) - Commit your changes (
git commit -m 'Add some AmazingFeature'
) - Push to the branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
- Follow TypeScript best practices
- Use ESLint and Prettier
- Write tests for new features
- Update documentation
type(scope): description
[optional body]
[optional footer]
Types: feat
, fix
, docs
, style
, refactor
, test
, chore
This project is licensed under the MIT License - see the LICENSE file for details.
MIT License
Copyright (c) 2024 Your Company
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
# π― RL-Powered T-Shirt Ad Campaign Optimizer
[Full license text...]
- OpenAI Gym - Inspiration for environment design
- Stable Baselines3 - Reference implementations
- TensorFlow.js - Neural network capabilities
- The RL Community - Continuous learning and support
- Documentation: https://docs.example.com
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: support@example.com
- Discord: Join our server
Maximizing profits through intelligent automation
- Modularized codebase with separate modules for agent, environment, platforms, observers, training, and a barrel export in
src/index.ts
. - Added simulator improvements (realistic spend/revenue logic, reward shaping, peak-hour boosts).
- Added real runner skeleton:
npm run run:real
with flags for--mode
,--daily-budget-target
,--peak-hours
,--delta-max
,--lambda-spend
,--lagrange-step
,--canary-list
. - Cost-sensitive objective (Ξ»-spend) to minimize spend while maximizing profit.
- Safety guardrails starter (
src/execution/guardrails.ts
), to enforce daily cap, delta clamp, peak hours, and freeze conditions. - Documentation added:
docs/api_spec.md
docs/real_integration.md
docs/low_spend_rollout.md
docs/poc_checklist.md