Performance experiments

We used pymdptoolbox as a benchmark for the performance of MDPSolver. Our tests included three optimization algorithms: Value Iteration (VI), Policy Iteration (PI), and Modified Policy Iteration (MPI) with standard value-updates. We did not test the other value-update methods since, in the pymdptoolxbox, successive over-relaxation was not available and Gauss-Seidel was only available for VI. Moreover, in pymdptoolxbox, the Gauss-Seidel VI employs the span norm stopping criterion, whereas MDPSolver reserves the span norm for standard value-updates. Thus, we only compared algorithms employing identical update methods and stopping criteria.

We tested the three algorithms on randomly generated MDP problems with exponentially distributed transition probabilities and normally distributed rewards. The column indices (next states) were sampled from a uniform distribution. Each randomly sampled MDP problem employed a constant action space which was the only option for pymdptoolbox. The size of the (constant) action space was uniformly sampled between 2 and 10. The tests included three levels of state space sizes (2500, 5000, and 10000) and three levels of transition matrix sparsities (80%, 90%, and 99%). We sampled 20 MDP problems for each combination of a state space size and sparsity.

Test Configuration Overview

State space sizes: 2500, 5000, 10000.
Sparsities: 80%, 90%, 99%.
Action space sizes: 2-10 (uniformly sampled).
Sampled MDP problems for each state space size and sparsity: 20.
Optimality criterion: Discounted reward ($\lambda=0.99$).
Tolerance ($\epsilon$): $1 \cdot 10^{-3}$ ($1 \cdot 10^{-4}$ for PI).
Partial evaluation limit: 100 (only used by MPI).

Results

Brief Overview:

Our Policy Iteration algorithm was up to 335x faster than the same algorithm in pymdptoolbox. For the Modified Policy Iteration algorithm, the speedup was up to 291x, and for the Value Iteration algorithm, the speedup was up to 4x.

Details:

We validated the performance of MDPSolver by dividing the runtime of pymdptoolbox with the runtime of MDPSolver for the same MDP and algorithm. That is, we defined the performance (speedup) of MDPSolver as $\delta = r_1 / r_2$, where $r_1$ is the runtime of pymdptoolbox, and $r_2$ the runtime of MDPSolver. If $r_1 > r_2$, MDPSolver outperforms the runtime of pymdptoolbox by a factor of $\delta$.

In general, $\delta$ was notably greater than 1 and increased as a function of the number of states and the sparsity. For the VI algorithm, however, the packages exhibited a similar performance. We deem that the discrepancy at 2500 states and 99% sparsity was caused by pymdptoolbox's implementation adapting to the state space size. The details of each algorithm follow below.

Policy Iteration: MDPSolver outperformed pymdptoolbox by an average factor between 21.6 (State space size: 5000, Sparsity: 80%) and 335.4 (State space size: 10000, Sparsity: 99%). Figure 1 illustrates boxplots of the performance tests for the PI algorithm.

Modified Policy Iteration: MDPSolver outperformed pymdptoolbox by an average factor between 9.7 (State space size: 2500, Sparsity: 80%) and 291.4 (State space size: 10000, Sparsity: 99%). Figure 2 illustrates boxplots of the performance tests for the MPI algorithm.

Value Iteration: MDPSolver outperformed pymdptoolbox by an average factor between 1.4 (State space size: 2500, Sparsity: 80%) and 3.8 (State space size: 2500, Sparsity: 99%). Figure 3 illustrates boxplots of the performance tests for the VI algorithm.

Figure 1: Relative runtime of pymdptoolbox compared to MDPSolver for the Policy Iteration algorithm.

Figure 2: Relative runtime of pymdptoolbox compared to MDPSolver for the Modified Policy Iteration algorithm.

Figure 3: Relative runtime of pymdptoolbox compared to MDPSolver for the Value Iteration algorithm.

Test Environment

Operating System: Windows 11.
Processor: AMD Ryzen 9 5900X 12-Core Processor 3.70 GHz.
Memory (RAM): 32GB
Python version: 3.12.3.
MDPSolver version: 0.9.7.
pymdptoolbox version: 4.0b3.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance experiments

Results

Test Environment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally