A Markov Decision Process (MDP) approach to the "Cabbage Merchant Problem," where we manage daily orders and uncertain shipments of cabbages with a 15% cancellation probability.
.
├── results
│ ├── policy.npy
│ └── value_function.npy
├── src
│ ├── simulation.py
│ └── value_iteration.py
├── readme.md
└── requirements.txt
-
results/ - Contains the converged optimal policy and value function files
policy.npy
- Stores the optimal policyvalue_function.npy
- Stores the value function
-
src/ - Contains the source code
value_iteration.py
- Implements the MDP model and runs value iterationsimulation.py
- Simulates business operations using the optimal policy
-
Clone the Repository
git clone https://github.com/ChengHua926/Sparc-2025-Write-Up.git cabbage-merchant-mdp cd cabbage-merchant-mdp
-
Create and Activate a Virtual Environment (Optional)
python -m venv venv # On Windows: venv\Scripts\activate # On Unix or MacOS: source venv/bin/activate
-
Install Dependencies
pip install -r requirements.txt
Navigate to the src/
directory and run:
python value_iteration.py
This runs a value iteration algorithm that may take up to an hour (depending on chosen parameters) to produce two files in results/
:
policy.npy
value_function.npy
After generating the policy, you can simulate day-by-day performance:
python simulation.py
The simulation will:
- Load
policy.npy
from theresults/
folder - Step through the MDP for a specified number of days
- Print total and average daily profit
The state is represented as (O₁,O₂,O₃, A₁,A₂, c₁,c₂) where:
- O_i: Number of orders to fulfill in i days
- A_j: Number of cabbages arriving in j days
- c_j: Cancellation flags (0 or 1) for the corresponding shipments
Each day, you can:
- Accept (1) or reject (0) the incoming daily order
- Buy 0-5 cabbages for delivery in 2 days
- If c₁=0, the day's shipment arrives; otherwise no arrivals
- Deliver cabbages to meet O₁
- Shift O₁←O₂, O₂←O₃, etc.
- Draw a new cancellation flag for day-after-tomorrow's shipment
Reward = 4 × (delivered) - (missed) - (purchased)
Value iteration computes the long-horizon strategy for maximizing expected profit.
For issues or questions, please:
- Open a GitHub issue
- Email: 25chenghua@gmail.com