-
Notifications
You must be signed in to change notification settings - Fork 1
Performance experiments
We used pymdptoolbox as a benchmark for the performance of MDPSolver. Our tests included three optimization algorithms: Value Iteration (VI), Policy Iteration (PI), and Modified Policy Iteration (MPI) with standard value-updates. We did not test the other value-update methods since, in the pymdptoolxbox, successive over-relaxation was not available and Gauss-Seidel was only available for VI. Moreover, in pymdptoolxbox, the Gauss-Seidel VI employs the span norm stopping criterion, whereas MDPSolver reserves the span norm for standard value-updates. Thus, we only compared algorithms employing identical update methods and stopping criteria.
We tested the three algorithms on randomly generated MDP problems with exponentially distributed transition probabilities and normally distributed rewards. The column indices (next states) were sampled from a uniform distribution. Each randomly sampled MDP problem employed a constant action space which was the only option for pymdptoolbox. The size of the (constant) action space was uniformly sampled between 2 and 10. The tests included three levels of state space sizes (2500, 5000, and 10000) and three levels of transition matrix sparsities (80%, 90%, and 99%). We sampled 20 MDP problems for each combination of a state space size and sparsity.
Test Configuration Overview
- State space sizes: 2500, 5000, 10000.
- Sparsities: 80%, 90%, 99%.
- Action space sizes: 2-10 (uniformly sampled).
- Sampled MDP problems for each state space size and sparsity: 20.
- Optimality criterion: Discounted reward (
$\lambda=0.99$ ). - Tolerance (
$\epsilon$ ):$1 \cdot 10^{-3}$ ($1 \cdot 10^{-4}$ for PI). - Partial evaluation limit: 100 (only used by MPI).
Brief Overview:
Our Policy Iteration algorithm was up to 335x faster than the same algorithm in pymdptoolbox. For the Modified Policy Iteration algorithm, the speedup was up to 291x, and for the Value Iteration algorithm, the speedup was up to 4x.
Details:
We validated the performance of MDPSolver by dividing the runtime of pymdptoolbox with the runtime of MDPSolver for the same MDP and algorithm. That is, we defined the performance (speedup) of MDPSolver as
In general,
Policy Iteration: MDPSolver outperformed pymdptoolbox by an average factor between 21.6 (State space size: 5000, Sparsity: 80%) and 335.4 (State space size: 10000, Sparsity: 99%). Figure 1 illustrates boxplots of the performance tests for the PI algorithm.
Modified Policy Iteration: MDPSolver outperformed pymdptoolbox by an average factor between 9.7 (State space size: 2500, Sparsity: 80%) and 291.4 (State space size: 10000, Sparsity: 99%). Figure 2 illustrates boxplots of the performance tests for the MPI algorithm.
Value Iteration: MDPSolver outperformed pymdptoolbox by an average factor between 1.4 (State space size: 2500, Sparsity: 80%) and 3.8 (State space size: 2500, Sparsity: 99%). Figure 3 illustrates boxplots of the performance tests for the VI algorithm.
Figure 1: Relative runtime of pymdptoolbox compared to MDPSolver for the Policy Iteration algorithm.
Figure 2: Relative runtime of pymdptoolbox compared to MDPSolver for the Modified Policy Iteration algorithm.
Figure 3: Relative runtime of pymdptoolbox compared to MDPSolver for the Value Iteration algorithm.
- Operating System: Windows 11.
- Processor: AMD Ryzen 9 5900X 12-Core Processor 3.70 GHz.
- Memory (RAM): 32GB
- Python version: 3.12.3.
- MDPSolver version: 0.9.7.
- pymdptoolbox version: 4.0b3.