Gymnasium-Compatible Inventory Management Environments & Benchmarks

This repository provides implementations of classic inventory management environments, adapted from the original OR-Gym library, updated for compatibility with the Gymnasium API (the successor to OpenAI Gym). It also includes comprehensive benchmarking scripts to compare various heuristic, optimization-inspired, and Reinforcement Learning (RL) policies on these environments.

The original environments are based on the work by Hubbs et al. (2020):

Paper: https://arxiv.org/abs/2008.04001
Original OR-Gym GitHub (using OpenAI Gym): https://github.com/hubbs5/or-gym

Environments Adapted:

Newsvendor (newsvendor.py): Multi-period newsvendor problem with lead times and stochastic Poisson demand (based on Balaji et al. 2019, https://arxiv.org/abs/1911.10641).
Inventory Management (inventory_management.py): Multi-period, multi-echelon inventory system for a single product. Includes InvManagementBacklogEnv and InvManagementLostSalesEnv.
Network Inventory Management (network_management.py): Multi-period, multi-node inventory system with a network structure (factories, distributors, retailers, markets). Includes NetInvMgmtBacklogEnv and NetInvMgmtLostSalesEnv.

Features

Gymnasium Compatible: Environments adhere to the modern Gymnasium API standard (reset returns obs, info, step returns obs, reward, terminated, truncated, info).
Three Core Environments: Covers single-item, multi-echelon, and network inventory problems.
Backlog & Lost Sales Variants: Specific environment classes (*BacklogEnv, *LostSalesEnv) implement these dynamics.
Comprehensive Benchmarking: Includes dedicated scripts (benchmark_*.py) for each environment variant, comparing various agents:
- Baselines: Random Agent.
- Heuristics: Relevant heuristics adapted for each environment type (e.g., Order-Up-To, Classic Newsvendor, (s,S) for Newsvendor; Base Stock, Constant Order for multi-echelon/network).
- Stable Baselines3 Agents: PPO, SAC, TD3, A2C, DDPG (plus example variations like LSTM). Scripts suffixed with _sb3_rllib.py explicitly include these. Simpler benchmark scripts might contain a subset.
- Ray RLlib Agents: PPO, SAC (plus framework to add more). Scripts suffixed with _sb3_rllib.py explicitly include these.
Detailed Reporting: Benchmarks generate:
- Summary tables comparing agents on average reward, consistency (std dev, min/max), operational metrics (service level, stockouts, inventory), and time (training/evaluation). Saved to CSV.
- Raw results per evaluation episode (CSV).
- Detailed step-by-step data (optional, JSON Lines).
- Comparison plots (Reward distribution boxplots, Reward vs. Operational Metrics scatter plots, Timing bar charts, RL learning curves). Saved to PNG.

Installation

Clone the repository:

git clone https://github.com/r2barati/or-gym-inventory.git
cd or-gym-inventory

Create a virtual environment (Recommended):

python -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Install Dependencies:

Ensure you have Python 3.8+ installed.
Upgrade pip: python -m pip install --upgrade pip

Install core requirements and RL libraries:

# Install base requirements (Gymnasium, Numpy, Scipy, Pandas, NetworkX, Matplotlib, Seaborn)
pip install -r requirements.txt

# --- Choose ONE framework for SB3 (torch recommended) ---
pip install stable-baselines3[extra] torch torchvision torchaudio
# OR
# pip install stable-baselines3[extra] tensorflow

# --- Install Ray RLlib (if running *_sb3_rllib.py scripts) ---
# Choose ONE framework (can match SB3 or be different, installing both is possible)
pip install "ray[rllib]" torch torchvision torchaudio
# OR
# pip install "ray[rllib]" tensorflow

(Note: requirements.txt should contain gymnasium, numpy, scipy, pandas, networkx, matplotlib, seaborn)

Usage

1. Using the Environments Directly

You can import and use the specific environment classes like any standard Gymnasium environment:

import gymnasium as gym
# Make sure the relevant python file (e.g., inventory_management.py) is in your path
from inventory_management import InvManagementLostSalesEnv # Example

# Configuration dictionary (optional, overrides defaults)
env_config = {
    'periods': 50,
    'I0': [50, 50],
    'L': [2, 4],
    'c': [30, 30]
    # ... other parameters specific to the environment ...
}

# Create the environment
# env = InvManagementLostSalesEnv(env_config=env_config)
env = InvManagementLostSalesEnv() # Use defaults

# Standard Gymnasium loop
observation, info = env.reset(seed=42)
# ... (rest of loop as shown in previous README version) ...
env.close()

2. Running the Benchmarks

The repository includes dedicated benchmark scripts for each environment variant. The scripts ending in _sb3_rllib.py contain the most comprehensive set of agents, including both Stable Baselines3 and Ray RLlib implementations. Simpler scripts might focus only on SB3 or heuristics.

Choose the script corresponding to the environment you want to benchmark:

Newsvendor (Comprehensive: SB3 + RLlib):

python benchmark_newsvendor_sb3_rllib.py

Inventory Management (Backlog, Comprehensive: SB3 + RLlib):
```
python benchmark_InvManagementBacklogEnv_sb3_rllib.py
```
Inventory Management (Lost Sales, Comprehensive: SB3 + RLlib):
```
python benchmark_InvManagementLostSalesEnv_sb3_rllib.py
```
Network Inventory Management (Backlog, Comprehensive: SB3 + RLlib):
```
python benchmark_NetInvMgmtBacklogEnv_sb3_rllib.py
```
Network Inventory Management (Lost Sales, Comprehensive: SB3 + RLlib):
```
python benchmark_NetInvMgmtLostSalesEnv_sb3_rllib.py
```

Note: Running the benchmarks, especially those involving RL agent training (RL_TRAINING_TIMESTEPS > 0), can take a significant amount of time, potentially hours depending on the number of steps, agents, and your hardware. Start with lower RL_TRAINING_TIMESTEPS (e.g., 10000-50000) in the script config to test functionality.

Environments Overview

NewsvendorEnv (newsvendor.py):
- Single product, single location.
- Agent decides order quantity each period.
- Stochastic demand (Poisson).
- Fixed lead time for orders.
- Costs: Purchase, Holding, Stockout (Lost Sales Penalty).
- Observation includes costs, demand mean, and pipeline inventory.
- Action is a single continuous value (order quantity).
InvManagement...Env (inventory_management.py):
- Single product, multi-echelon linear supply chain (e.g., Retailer -> Distributor -> Manufacturer).
- Agent decides order quantity for each stage (except the raw material source) each period.
- Stochastic demand (configurable distribution) only at the retailer (stage 0).
- Lead times between stages.
- Production capacities at manufacturing stages.
- Costs: Purchase/Replenishment, Holding (on-hand), Backlog/Lost Sales.
- Observation includes on-hand inventory and recent order history (pipeline).
- Action is a vector of continuous/integer values (order quantities per stage).
- Variants: InvManagementBacklogEnv and InvManagementLostSalesEnv.
NetInvMgmt...Env (network_management.py):
- Single product, arbitrary network structure (defined via networkx graph).
- Nodes can be Raw Material, Factory, Distributor, Retailer, Market.
- Agent decides order quantity for each valid link between supplying/receiving nodes.
- Stochastic demand (configurable) occurs at links between Retailers and Markets.
- Lead times associated with links.
- Production capacities and yields at factory nodes.
- Costs: Purchase/Replenishment (link), Operating (factory), Holding (on-hand at node, pipeline on link), Backlog/Lost Sales (market link).
- Observation includes market backlog/demand, node inventories, and pipeline history per link.
- Action is a vector of continuous values (order quantity per link).
- Variants: NetInvMgmtBacklogEnv and NetInvMgmtLostSalesEnv.

(Refer to the docstrings within each environment file (.py) for details on observation/action spaces, reward calculation, and specific parameters.)

Benchmarking Details

The benchmark_*.py scripts provide a framework to compare agents. The _sb3_rllib.py versions are the most comprehensive.

Agents Included:

Heuristics:
- RandomAgent: Random actions.
- ConstantOrderAgent: Orders a fixed fraction (Network env).
- OrderUpToHeuristicAgent: Targets expected demand over L+1 (Newsvendor).
- ClassicNewsvendorAgent: Uses critical ratio and demand quantile (Newsvendor).
- sSPolicyAgent: Orders up to S if below s (Newsvendor).
- BaseStockAgent: Simple independent base stock per stage (InvManagement).
Stable Baselines3 (SB3):
- PPO, SAC, TD3, A2C, DDPG
- Example variations (LSTM policy, different buffer/LR/network sizes).
Ray RLlib:
- PPO, SAC (Examples)
- Framework allows easy addition of others (TD3, DDPG, APEX, IMPALA etc.).

Metrics Collected (per Agent):

Average Total Reward (and Median, Std Dev, Min, Max)
Average Service Level (Fill Rate, usually at retailer/market)
Average/Total Stockout Quantity
Average Ending Inventory
Average Evaluation Time per Episode
Total Training Time (for RL agents)
Evaluation Success Rate

Results Interpretation

The benchmark scripts save results into subdirectories named like benchmark_<ENV_NAME>_combined/results/:

*_summary.csv: A table summarizing the average performance and time metrics for each agent, sorted by average reward. This is the main comparison table.
*_raw_summary.csv: Contains the results (total reward, metrics) for each individual evaluation episode for every agent. Useful for statistical analysis or plotting distributions.
*_step_details.jsonl: (Optional, if COLLECT_STEP_DETAILS=True) Contains detailed data for every step within every evaluation episode (reward, action, demand, sales, etc.). Can be very large but useful for deep dives.
*_rewards_boxplot.png: Visualizes the distribution of total rewards achieved by each agent across the evaluation episodes. Helps assess consistency.
*_reward_vs_service.png / *_reward_vs_inventory.png: Scatter plots showing trade-offs between average reward and key operational metrics.
*_eval_time_log.png / *_train_time.png: Bar charts comparing evaluation and training times.
*_learning_curves.png: Shows the training progress (reward vs. timesteps) for the RL agents, plotted from SB3 Monitor files and/or custom RLlib logs.

Dependencies

Python 3.8+
gymnasium
numpy
scipy
pandas
networkx (for NetInvMgmt environment)
matplotlib
seaborn
stable-baselines3[extra] (pip install stable-baselines3[extra] torch # or tensorflow)
ray[rllib] (pip install "ray[rllib]" torch # or tensorflow)

See requirements.txt for base dependencies.

References

Hubbs, C., Perez, H. D., Sarwar, O., Li, C., & Papageorgiou, D. (2020). OR-Gym: A Reinforcement Learning Library for Operations Research Problems. arXiv preprint arXiv:2008.04001. (Link)

License

MIT License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Gymnasium-Compatible Inventory Management Environments & Benchmarks

Features

Installation

Usage

1. Using the Environments Directly

2. Running the Benchmarks

Environments Overview

Benchmarking Details

Results Interpretation

Dependencies

References

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
README.md		README.md
benchmark_InvManagementBacklogEnv.py		benchmark_InvManagementBacklogEnv.py
benchmark_InvManagementBacklogEnv_sb3_rllib.py		benchmark_InvManagementBacklogEnv_sb3_rllib.py
benchmark_InvManagementLostSalesEnv.py		benchmark_InvManagementLostSalesEnv.py
benchmark_InvManagementLostSalesEnv_sb3_rllib.py		benchmark_InvManagementLostSalesEnv_sb3_rllib.py
benchmark_NetInvMgmtBacklogEnv.py		benchmark_NetInvMgmtBacklogEnv.py
benchmark_NetInvMgmtBacklogEnv_sb3_rllib.py		benchmark_NetInvMgmtBacklogEnv_sb3_rllib.py
benchmark_NetInvMgmtLostSalesEnv.py		benchmark_NetInvMgmtLostSalesEnv.py
benchmark_NetInvMgmtLostSalesEnv_sb3_rllib.py		benchmark_NetInvMgmtLostSalesEnv_sb3_rllib.py
benchmark_newsvendor.py		benchmark_newsvendor.py
benchmark_newsvendor_sb3_rllib.py		benchmark_newsvendor_sb3_rllib.py
inventory_management.py		inventory_management.py
network_management.py		network_management.py
newsvendor.py		newsvendor.py
requirements.txt		requirements.txt

r2barati/or-gym-inventory

Folders and files

Latest commit

History

Repository files navigation

Gymnasium-Compatible Inventory Management Environments & Benchmarks

Features

Installation

Usage

1. Using the Environments Directly

2. Running the Benchmarks

Environments Overview

Benchmarking Details

Results Interpretation

Dependencies

References

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages