Skip to content

RL-Powered T-Shirt Ad Campaign Optimizer: modular TypeScript system applying RL to automate and optimize ad campaigns across TikTok, Instagram, and Shopify

Notifications You must be signed in to change notification settings

albeorla/ad-optimizer-rl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

26 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🎯 RL-Powered T-Shirt Ad Campaign Optimizer

TypeScript License Node Status Platforms

An intelligent reinforcement learning system that automates and optimizes advertising campaigns for e-commerce t-shirt businesses across multiple platforms, maximizing profitability through continuous learning.

Note: The code has been modularized into separate files for clarity and maintainability. See the Project Structure section below for details and updated commands.

πŸ“‹ Table of Contents

🧱 Project Structure (Modularized)

This repo uses a modular TypeScript structure:

src/
  agent/
    base.ts            # RLAgent abstract base
    dqnAgent.ts        # DQNAgent implementation
  environment/
    simulator.ts       # Ad environment simulator
  observers/
    types.ts           # TrainingObserver interface
    consoleLogger.ts   # Console logger observer
    metricsCollector.ts# Metrics collector observer
  platforms/
    base.ts            # AdPlatformAPI abstract base
    factory.ts         # Platform factory
    mockTikTok.ts      # TikTok mock API
    mockInstagram.ts   # Instagram mock API
  types.ts             # Shared types (state/actions/metrics)
  index.ts             # Barrel exports
  main.ts              # CLI/entry point (training demo)

Build and run commands:

  • Start (dev): npm start (ts-node src/main.ts)
  • Build: npm run build (outputs to dist/)
  • Start (prod): npm run start:prod (builds then runs node dist/main.js)

🌟 Overview

This project implements a Deep Q-Learning (DQN) agent that learns to optimize advertising campaigns for t-shirt businesses across TikTok and Instagram. By continuously learning from campaign performance data, the system automatically adjusts budgets, targeting parameters, creative strategies, and platform allocation to maximize profit.

🎬 Demo

Episode 1 | Total Reward: -2.34 | Profit: -$234
Episode 50 | Total Reward: 8.92 | Profit: $892
         ↑ 380% improvement through learning!

πŸ“š Documentation

  • API Integration Spec: docs/api_spec.md
  • Production Integration Guide: docs/real_integration.md
  • Low-Spend Rollout Guide: docs/low_spend_rollout.md
  • PoC Launch Checklist: docs/poc_checklist.md
  • Mathematical Primer: docs/math_primer.md
  • Torch.js DQN Refactor Tutorial: docs/torchjs_dqn_refactor.md

Q-Learning β†’ DQN (Torch.js)

If you are migrating from the current tabular approach in src/agent/dqnAgent.ts to a true DQN, start here:

  • Replace Q-table with a Q-network (state β†’ Q-values).
  • Encode states and index actions consistently.
  • Add replay buffer and target network.
  • Train with TD targets and MSE loss.
  • Schedule Ξ΅ and LR; persist and evaluate.

See docs/torchjs_dqn_refactor.md for a concise, step-by-step guide.

πŸ€– Agent Selection (Tabular vs NN)

  • Tabular baseline (Q-table):
    • npm start
  • DQN neural agent (Torch.js-style over TF.js):
    • npm start -- --agent=nn
    • Useful flags: --episodes, --batchSize, --gamma, --lr, --trainFreq, --targetSync, --replayCap, --epsilonStart, --epsilonMin, --epsilonDecay

Example:

npm start -- \
  --agent=nn \
  --episodes=200 \
  --batchSize=64 \
  --gamma=0.97 \
  --lr=0.0005 \
  --targetSync=500 \
  --replayCap=20000

Notes

  • Backend: uses @tensorflow/tfjs by default for portability; consider @tensorflow/tfjs-node for faster training.
  • Encoding/Actions: see src/agent/encoding.ts for deterministic feature mapping and the action grid.

πŸ§ͺ Real Runner Quick Start (Shadow/Pilot)

Run the real-runner skeleton in shadow mode with a strict $30/day cap and peak hours (adapters to be implemented before going live):

npm run run:real -- \
  --mode=shadow \
  --daily-budget-target=30 \
  --peak-hours=18-22 \
  --delta-max=0.10 \
  --lambda-spend=0.25 \
  --lagrange-step=0.05 \
  --canary-list="tiktok:ADSET_ID,instagram:ADSET_ID"

Then review logs and the PoC checklist before enabling --mode=pilot.

πŸ“Ί Training Progress Panel (ASCII)

Example of a tidy console panel you can print during training:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚           Training Progress (847/1000)      β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Progress: β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘β–‘ 84.7%   β”‚
β”‚ Current Reward:        7.23                 β”‚
β”‚ Avg Reward (last 100): 6.85                 β”‚
β”‚ Best Reward:           9.42                 β”‚
β”‚ Epsilon:               0.03                 β”‚
β”‚ Learning Rate:         0.001                β”‚
β”‚ Platform:              TikTok 62% | IG 38%  β”‚
β”‚ Top Creative:          UGC (34%)            β”‚
β”‚ Top Age Group:         18-24 (41%)          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸš€ Why Reinforcement Learning for Ad Optimization?

Traditional rule-based ad optimization fails to capture complex, non-linear relationships between:

  • Temporal patterns (day/hour performance variations)
  • Platform dynamics (TikTok vs Instagram audiences)
  • Creative fatigue (performance decay over time)
  • Competitive landscapes (bidding wars, market saturation)

RL agents discover optimal strategies through exploration and exploitation, continuously adapting to market changes.

✨ Key Features

πŸ€– Intelligent Automation

  • Self-learning optimization without manual rules
  • Multi-platform orchestration across TikTok, Instagram, and Shopify
  • 24/7 autonomous operation with safety guardrails

πŸ“Š Advanced Capabilities

  • Deep Q-Network (DQN) with experience replay
  • Real-time adaptation to market conditions
  • A/B testing integration for policy validation
  • Multi-objective optimization (profit, ROAS, CPA)

πŸ—οΈ Enterprise Architecture

  • SOLID principles and Gang of Four patterns
  • Modular design for easy platform additions
  • Observable training with metrics collection
  • Production-ready logging and monitoring

πŸ”§ Developer Experience

  • TypeScript for type safety
  • Mock APIs for development/testing
  • Comprehensive testing suite
  • Detailed documentation and examples

πŸ›οΈ Architecture

System Overview

graph TB
    subgraph "RL Agent Core"
        A[DQN Agent] --> B[Q-Network]
        A --> C[Experience Replay]
        A --> D[Action Selection]
    end

    subgraph "Environment"
        E[Ad Environment Simulator]
        E --> F[State Manager]
        E --> G[Reward Calculator]
    end

    subgraph "Platform APIs"
        H[TikTok API]
        I[Instagram API]
        J[Shopify API]
    end

    subgraph "Training Pipeline"
        K[Training Controller]
        K --> L[Episode Manager]
        K --> M[Metrics Collector]
    end

    A <--> E
    E <--> H
    E <--> I
    E <--> J
    K --> A
    K --> E
    M --> N[Monitoring Dashboard]
Loading

Component Architecture

src/
β”œβ”€β”€ core/
β”‚   β”œβ”€β”€ interfaces/           # TypeScript interfaces
β”‚   β”‚   β”œβ”€β”€ IAdEnvironment.ts
β”‚   β”‚   β”œβ”€β”€ IAgent.ts
β”‚   β”‚   └── IPlatformAPI.ts
β”‚   β”œβ”€β”€ agents/               # RL Agent implementations
β”‚   β”‚   β”œβ”€β”€ DQNAgent.ts      # Deep Q-Learning
β”‚   β”‚   β”œβ”€β”€ PPOAgent.ts      # Proximal Policy Optimization
β”‚   β”‚   └── A2CAgent.ts      # Advantage Actor-Critic
β”‚   └── environment/          # Environment logic
β”‚       β”œβ”€β”€ AdEnvironment.ts
β”‚       β”œβ”€β”€ StateManager.ts
β”‚       └── RewardCalculator.ts
β”œβ”€β”€ platforms/                # Platform integrations
β”‚   β”œβ”€β”€ tiktok/
β”‚   β”‚   β”œβ”€β”€ TikTokAPI.ts
β”‚   β”‚   └── TikTokSimulator.ts
β”‚   β”œβ”€β”€ instagram/
β”‚   β”‚   β”œβ”€β”€ InstagramAPI.ts
β”‚   β”‚   └── InstagramSimulator.ts
β”‚   └── factory/
β”‚       └── PlatformFactory.ts
β”œβ”€β”€ training/                 # Training pipeline
β”‚   β”œβ”€β”€ TrainingPipeline.ts
β”‚   β”œβ”€β”€ observers/
β”‚   β”‚   β”œβ”€β”€ ConsoleLogger.ts
β”‚   β”‚   β”œβ”€β”€ MetricsCollector.ts
β”‚   β”‚   └── TensorBoard.ts
β”‚   └── replay/
β”‚       └── ExperienceReplay.ts
β”œβ”€β”€ utils/                    # Utilities
β”‚   β”œβ”€β”€ config/
β”‚   β”‚   └── Configuration.ts
β”‚   β”œβ”€β”€ logging/
β”‚   β”‚   └── Logger.ts
β”‚   └── metrics/
β”‚       └── MetricsCalculator.ts
└── main.ts                   # Entry point

Design Patterns

Pattern Implementation Purpose
Strategy RLAgent base class Swap RL algorithms (DQN→PPO)
Factory PlatformFactory Create platform-specific APIs
Observer TrainingObserver Monitor training progress
Adapter EnvironmentSimulator Unified platform interface
Command TrainingPipeline Encapsulate training operations
Singleton Configuration Global settings management
Template Method BaseAgent.train() Standardize training loop

πŸš€ Quick Start

# Clone the repository
git clone https://github.com/yourusername/rl-tshirt-ads.git
cd rl-tshirt-ads

# Install dependencies
npm install

# Run with default configuration
npm start

# Watch training progress
tail -f logs/training.log

πŸ“¦ Installation

Prerequisites

  • Node.js 18.0+ (Download)
  • TypeScript 4.9+ (npm install -g typescript)
  • Git 2.0+

Step-by-Step Installation

  1. Clone the repository

    git clone https://github.com/yourusername/rl-tshirt-ads.git
    cd rl-tshirt-ads
  2. Install dependencies

    npm install
  3. Configure environment

    cp .env.example .env
    # Edit .env with your settings
  4. Build the project

    npm run build
  5. Run tests

    npm test
  6. Start training

    npm run train

Docker Installation

# Dockerfile
FROM node:18-alpine
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY . .
RUN npm run build
CMD ["npm", "start"]
# Build and run with Docker
docker build -t rl-tshirt-ads .
docker run -it --rm rl-tshirt-ads

πŸ’» Usage

Basic Training

import { DQNAgent, AdEnvironmentSimulator, TrainingPipeline } from "./src";

// Initialize components
const agent = new DQNAgent({
  learningRate: 0.001,
  discountFactor: 0.95,
  epsilon: 1.0,
  epsilonDecay: 0.995,
});

const environment = new AdEnvironmentSimulator({
  platforms: ["tiktok", "instagram"],
  initialBudget: 500,
  episodeLength: 24, // hours
});

const pipeline = new TrainingPipeline(agent, environment);

// Train the agent
await pipeline.train({
  episodes: 1000,
  saveInterval: 100,
  logInterval: 10,
});

Advanced Configuration

// Custom reward shaping
const customRewardCalculator = new RewardCalculator({
  profitWeight: 0.7,
  roasWeight: 0.2,
  conversionWeight: 0.1,
  penalizeBudgetOverspend: true,
});

// Multi-objective optimization
const agent = new DQNAgent({
  rewardCalculator: customRewardCalculator,
  actionSpace: {
    budgetRange: [0.5, 2.0],
    platforms: ["tiktok", "instagram", "facebook"],
    creativeTypes: ["lifestyle", "product", "ugc", "discount"],
    ageGroups: ["18-24", "25-34", "35-44", "45+"],
  },
});

// Add custom observers
pipeline.addObserver(new TensorBoardLogger());
pipeline.addObserver(
  new SlackNotifier({
    webhookUrl: process.env.SLACK_WEBHOOK,
    notifyOn: ["episode_complete", "milestone_reached"],
  })
);

Production Deployment

// Load pre-trained model
const agent = new DQNAgent();
await agent.load("./models/production_model.json");

// Set to exploitation mode (no exploration)
agent.setEpsilon(0);

// Run in production with safety constraints
const productionEnv = new AdEnvironmentSimulator({
  mode: "production",
  constraints: {
    maxDailyBudget: 10000,
    minROAS: 1.5,
    maxBudgetChangePercent: 30,
  },
});

// Execute optimizations
const controller = new ProductionController(agent, productionEnv);
await controller.run({
  interval: "1h", // Run every hour
  dryRun: false, // Apply changes to real campaigns
  monitoring: true, // Enable performance monitoring
});

βš™οΈ Configuration

Environment Variables (.env)

# Training Configuration
EPISODES=1000
BATCH_SIZE=32
LEARNING_RATE=0.001
DISCOUNT_FACTOR=0.95
EPSILON_START=1.0
EPSILON_DECAY=0.995
EPSILON_MIN=0.01

# Platform Configuration
TIKTOK_API_KEY=mock_key_123
INSTAGRAM_API_KEY=mock_key_456
SHOPIFY_API_KEY=mock_key_789

# Monitoring
ENABLE_TENSORBOARD=true
TENSORBOARD_PORT=6006
LOG_LEVEL=info
METRICS_EXPORT_PATH=./metrics

# Safety Constraints
MAX_DAILY_BUDGET=10000
MIN_ROAS_THRESHOLD=1.0
MAX_BUDGET_CHANGE_PERCENT=50

# Real-world Constraints
# Pricing and costs
TSHIRT_PRICE=29.99           # or PRODUCT_PRICE
PRINTFUL_COGS=15.00          # or COGS_PER_UNIT

# Platform availability
ALLOWED_PLATFORMS=tiktok     # comma-separated: e.g., "tiktok,instagram"
DISABLE_INSTAGRAM=true       # optional convenience flag

# Creative constraints
LOCKED_CREATIVE_TYPE=ugc     # lock to a single creative type

# Budgeting
DAILY_BUDGET_TARGET=30       # shapes hourly spend penalty in reward

These environment variables adjust the simulator to better reflect real operating constraints:

  • TSHIRT_PRICE/PRODUCT_PRICE: Revenue per unit sold.
  • PRINTFUL_COGS/COGS_PER_UNIT: Cost of goods per unit (used to compute net profit).
  • ALLOWED_PLATFORMS or DISABLE_INSTAGRAM: Restrict simulator to platforms you can actually run.
  • LOCKED_CREATIVE_TYPE: Force a single creative type when you only have one asset.
  • DAILY_BUDGET_TARGET: Sets the hourly cap used for overspend penalties in reward shaping.

πŸ•ΆοΈ Shadow-Mode Training (Real Data Scaffolding)

This repo now includes scaffolding to run a shadow-mode loop that composes real TikTok ad spend and Shopify revenue while never writing changes back to platforms.

  • Real Shopify data source: src/datasources/shopify.ts (stubbed)
  • Real TikTok API adapter: src/platforms/realTikTok.ts (stubbed)
  • Real shadow environment: src/environment/realShadow.ts
  • Runner: src/run/shadowTraining.ts

Usage:

  • Set env vars for constraints and credentials (if wiring real APIs):
    • PRINTFUL_COGS=15, TSHIRT_PRICE=29.99, DAILY_BUDGET_TARGET=30
    • SHOPIFY_API_KEY=..., SHOPIFY_STORE_DOMAIN=...
    • TIKTOK_API_KEY=...
  • Run: npm run build && node dist/run/shadowTraining.js --episodes=50

Notes:

  • The stubs return zero metrics by default; replace TODOs with real HTTP calls.
  • Reward shaping is margin-based ROAS: (revenue - COGS) / adSpend thresholds drive bonuses, not gross ROAS.

Configuration File (config.json)

{
  "agent": {
    "type": "DQN",
    "network": {
      "hidden_layers": [128, 64, 32],
      "activation": "relu",
      "optimizer": "adam"
    },
    "memory": {
      "capacity": 10000,
      "batch_size": 32
    }
  },
  "environment": {
    "state_space": {
      "dimensions": 15,
      "normalization": true
    },
    "action_space": {
      "type": "discrete",
      "size": 120
    },
    "reward": {
      "type": "profit",
      "normalization_factor": 1000
    }
  },
  "training": {
    "episodes": 1000,
    "max_steps_per_episode": 24,
    "save_interval": 100,
    "evaluation_interval": 50
  }
}

πŸ“š API Reference

Core Classes

DQNAgent

class DQNAgent extends RLAgent {
  constructor(config?: AgentConfig);

  // Core methods
  selectAction(state: AdEnvironmentState): AdAction;
  update(
    state: AdEnvironmentState,
    action: AdAction,
    reward: number,
    nextState: AdEnvironmentState
  ): void;

  // Model persistence
  save(filepath: string): Promise<void>;
  load(filepath: string): Promise<void>;

  // Configuration
  setEpsilon(value: number): void;
  setLearningRate(value: number): void;
}

AdEnvironmentSimulator

class AdEnvironmentSimulator {
  constructor(config?: EnvironmentConfig);

  // Environment control
  reset(): AdEnvironmentState;
  step(action: AdAction): [AdEnvironmentState, number, boolean];

  // State management
  getCurrentState(): AdEnvironmentState;
  setState(state: AdEnvironmentState): void;

  // Platform management
  addPlatform(name: string, api: AdPlatformAPI): void;
  removePlatform(name: string): void;
}

TrainingPipeline

class TrainingPipeline {
  constructor(agent: RLAgent, environment: AdEnvironmentSimulator);

  // Training control
  train(config: TrainingConfig): Promise<TrainingResults>;
  pause(): void;
  resume(): void;
  stop(): void;

  // Observation
  addObserver(observer: TrainingObserver): void;
  removeObserver(observer: TrainingObserver): void;

  // Metrics
  getMetrics(): TrainingMetrics;
  exportMetrics(filepath: string): Promise<void>;
}

Interfaces

AdEnvironmentState

interface AdEnvironmentState {
  // Temporal features
  dayOfWeek: number; // 0-6
  hourOfDay: number; // 0-23

  // Campaign parameters
  currentBudget: number;
  targetAgeGroup: string;
  targetInterests: string[];
  creativeType: string;
  platform: string;

  // Performance metrics
  historicalCTR: number;
  historicalCVR: number;

  // Market conditions
  competitorActivity: number; // 0-1
  seasonality: number; // 0-1
}

AdAction

interface AdAction {
  budgetAdjustment: number; // Multiplier (0.5-2.0)
  targetAgeGroup: string;
  targetInterests: string[];
  creativeType: string;
  bidStrategy: "CPC" | "CPM" | "CPA";
  platform: "tiktok" | "instagram" | "shopify";
}

🧠 How It Works

1. State Observation

The agent observes the current state of all ad campaigns:

  • Temporal context: Day of week, hour of day
  • Campaign settings: Budget, targeting, creative
  • Performance history: CTR, CVR, ROAS
  • Market dynamics: Competition, seasonality

2. Action Selection

Using Ξ΅-greedy strategy:

  • Exploration (Ξ΅): Try random actions to discover new strategies
  • Exploitation (1-Ξ΅): Choose the best-known action for the current state

3. Environment Interaction

The environment simulates campaign performance:

Action β†’ API Call β†’ Performance Metrics β†’ Reward Signal

4. Learning Update

Q-learning formula:

Q(s,a) ← Q(s,a) + Ξ±[r + Ξ³ max Q(s',a') - Q(s,a)]

Where:

  • Q(s,a): Expected value of action a in state s
  • Ξ±: Learning rate
  • r: Immediate reward
  • Ξ³: Discount factor
  • s': Next state

5. Experience Replay

Store experiences and learn from random batches:

  • Breaks correlation between sequential experiences
  • Improves sample efficiency
  • Stabilizes learning

πŸ“Š Performance Metrics

Training Metrics

Metric Description Target
Average Episode Reward Mean reward over last 100 episodes > 5.0
Convergence Rate Episodes to stable performance < 500
Exploration Efficiency Unique state-actions discovered > 80%
Learning Stability Reward variance over time < 0.5

Business Metrics

Metric Formula Target
Profit Revenue - Ad Spend Maximize
ROAS Revenue / Ad Spend > 3.0
CPA Ad Spend / Conversions < $15
CTR Clicks / Impressions > 2%
CVR Conversions / Clicks > 3%

Monitoring Dashboard

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚          Training Progress              β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚ Episode: 847/1000                       β”‚
β”‚ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘ 84.7%          β”‚
β”‚                                         β”‚
β”‚ Current Reward: 7.23                    β”‚
β”‚ Avg Reward (100 ep): 6.85               β”‚
β”‚ Best Reward: 9.42                       β”‚
β”‚                                         β”‚
β”‚ Epsilon: 0.03                           β”‚
β”‚ Learning Rate: 0.001                    β”‚
β”‚                                         β”‚
β”‚ Platform Distribution:                  β”‚
β”‚   TikTok: 62%                           β”‚
β”‚   Instagram: 38%                        β”‚
β”‚                                         β”‚
β”‚ Top Creative: UGC (34%)                 β”‚
β”‚ Top Age Group: 18-24 (41%)              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“ Examples

Example 1: Basic Training Script

// train.ts
import { createDefaultPipeline } from "./src/factory";

async function main() {
  // Create pipeline with defaults
  const pipeline = createDefaultPipeline();

  // Train for 100 episodes
  const results = await pipeline.train({ episodes: 100 });

  // Print results
  console.log("Training Complete!");
  console.log(`Final Avg Reward: ${results.avgReward}`);
  console.log(`Best Episode: ${results.bestEpisode}`);
  console.log(`Total Profit: $${results.totalProfit}`);
}

main().catch(console.error);

Example 2: Custom Platform Integration

// custom-platform.ts
import { AdPlatformAPI, AdPlatformFactory } from "./src/platforms";

class CustomPlatformAPI extends AdPlatformAPI {
  async updateCampaign(campaignId: string, params: any): Promise<any> {
    // Your custom API logic
    return { success: true };
  }

  simulatePerformance(
    state: AdEnvironmentState,
    action: AdAction
  ): RewardMetrics {
    // Custom performance simulation
    return {
      revenue: Math.random() * 1000,
      adSpend: action.budgetAdjustment * state.currentBudget,
      profit: 0,
      roas: 0,
      conversions: 0,
    };
  }
}

// Register the new platform
AdPlatformFactory.registerPlatform("custom", new CustomPlatformAPI());

Example 3: Real-time Optimization

// realtime.ts
import { ProductionController } from "./src/production";

const controller = new ProductionController({
  modelPath: "./models/trained_model.json",
  platforms: ["tiktok", "instagram"],
  updateInterval: "30m",
  dryRun: false,
});

// Start real-time optimization
controller.start();

// Monitor performance
controller.on("optimization", (result) => {
  console.log(`Optimization at ${new Date()}`);
  console.log(`Platform: ${result.platform}`);
  console.log(`Budget Change: ${result.budgetChange}%`);
  console.log(`Expected Profit: $${result.expectedProfit}`);
});

πŸ§ͺ Testing

Run All Tests

npm test

Run Specific Test Suites

# Unit tests
npm run test:unit

# Integration tests
npm run test:integration

# E2E tests
npm run test:e2e

# Performance tests
npm run test:performance

Test Coverage

npm run test:coverage

Example Test

// tests/agent.test.ts
import { DQNAgent } from "../src/agents/DQNAgent";

describe("DQNAgent", () => {
  let agent: DQNAgent;

  beforeEach(() => {
    agent = new DQNAgent({ epsilon: 0.5 });
  });

  test("should select random action during exploration", () => {
    const state = createMockState();
    const actions = new Set();

    // Collect 100 actions
    for (let i = 0; i < 100; i++) {
      const action = agent.selectAction(state);
      actions.add(JSON.stringify(action));
    }

    // Should have multiple different actions
    expect(actions.size).toBeGreaterThan(1);
  });

  test("should improve performance through learning", () => {
    const env = createMockEnvironment();
    const initialReward = evaluateAgent(agent, env);

    // Train for 100 episodes
    trainAgent(agent, env, 100);

    const finalReward = evaluateAgent(agent, env);
    expect(finalReward).toBeGreaterThan(initialReward);
  });
});

🚒 Deployment

Development

npm run dev

Staging

npm run deploy:staging

Production

Using PM2

# Install PM2
npm install -g pm2

# Start application
pm2 start ecosystem.config.js --env production

# Monitor
pm2 monit

# Logs
pm2 logs

Using Kubernetes

# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: rl-tshirt-ads
spec:
  replicas: 3
  selector:
    matchLabels:
      app: rl-tshirt-ads
  template:
    metadata:
      labels:
        app: rl-tshirt-ads
    spec:
      containers:
        - name: app
          image: your-registry/rl-tshirt-ads:latest
          ports:
            - containerPort: 3000
          env:
            - name: NODE_ENV
              value: "production"
          resources:
            requests:
              memory: "512Mi"
              cpu: "500m"
            limits:
              memory: "1Gi"
              cpu: "1000m"

πŸ”§ Troubleshooting

Common Issues

Issue: Training not converging

Solution:

  • Decrease learning rate: agent.setLearningRate(0.0001)
  • Increase exploration: agent.setEpsilon(1.0)
  • Check reward normalization

Issue: Out of memory during training

Solution:

  • Reduce replay buffer size
  • Decrease batch size
  • Enable memory profiling: NODE_OPTIONS="--max-old-space-size=4096"

Issue: Poor performance on specific platform

Solution:

  • Platform-specific reward shaping
  • Increase training episodes for that platform
  • Check platform API simulator accuracy

Debug Mode

# Enable verbose logging
DEBUG=* npm start

# Profile memory usage
npm run profile:memory

# Analyze performance
npm run profile:cpu

πŸ—ΊοΈ Roadmap

Phase 1: Foundation (Current)

  • βœ… Basic DQN implementation
  • βœ… Mock platform APIs
  • βœ… Training pipeline
  • βœ… Metrics collection

Phase 2: Advanced RL (Q1 2025)

  • ⬜ Proximal Policy Optimization (PPO)
  • ⬜ Multi-agent competition
  • ⬜ Continuous action spaces
  • ⬜ Hierarchical RL for campaign strategies

Phase 3: Production Features (Q2 2025)

  • ⬜ Real API integrations
  • ⬜ A/B testing framework
  • ⬜ AutoML for hyperparameter tuning
  • ⬜ Real-time streaming data pipeline

Phase 4: Scale & Intelligence (Q3 2025)

  • ⬜ Distributed training (Ray/RLlib)
  • ⬜ Transfer learning between businesses
  • ⬜ Natural language strategy descriptions
  • ⬜ Automated creative generation

Phase 5: Platform Expansion (Q4 2025)

  • ⬜ Google Ads integration
  • ⬜ Amazon Advertising
  • ⬜ LinkedIn Ads
  • ⬜ Cross-platform budget optimization

🀝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

Development Setup

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

Code Style

  • Follow TypeScript best practices
  • Use ESLint and Prettier
  • Write tests for new features
  • Update documentation

Commit Convention

type(scope): description

[optional body]

[optional footer]

Types: feat, fix, docs, style, refactor, test, chore

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

MIT License

Copyright (c) 2024 Your Company

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

# 🎯 RL-Powered T-Shirt Ad Campaign Optimizer
[Full license text...]

πŸ™ Acknowledgments

  • OpenAI Gym - Inspiration for environment design
  • Stable Baselines3 - Reference implementations
  • TensorFlow.js - Neural network capabilities
  • The RL Community - Continuous learning and support

πŸ“ž Support

πŸ“ˆ Status

Build Status Code Coverage Dependencies Last Commit Stars


Built with ❀️ by the AI Optimization Team
Maximizing profits through intelligent automation

Project Updates (Modularization + Real Runner)

  • Modularized codebase with separate modules for agent, environment, platforms, observers, training, and a barrel export in src/index.ts.
  • Added simulator improvements (realistic spend/revenue logic, reward shaping, peak-hour boosts).
  • Added real runner skeleton: npm run run:real with flags for --mode, --daily-budget-target, --peak-hours, --delta-max, --lambda-spend, --lagrange-step, --canary-list.
  • Cost-sensitive objective (Ξ»-spend) to minimize spend while maximizing profit.
  • Safety guardrails starter (src/execution/guardrails.ts), to enforce daily cap, delta clamp, peak hours, and freeze conditions.
  • Documentation added:
    • docs/api_spec.md
    • docs/real_integration.md
    • docs/low_spend_rollout.md
    • docs/poc_checklist.md

About

RL-Powered T-Shirt Ad Campaign Optimizer: modular TypeScript system applying RL to automate and optimize ad campaigns across TikTok, Instagram, and Shopify

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published