How to get the optimal value functions out of FiniteMDP environment? #123

ngmq · 2022-02-17T03:19:04Z

ngmq
Feb 17, 2022

Hi, is there a one-liner for getting the optimal Q* and V* of an environment after specifying the reward and transition functions (env.R and env.P)?

Answered by omardrwch

Feb 17, 2022

Hello,

Yes, you can either use the method rlberry.agents.dynprog.utils.value_iteration or the agent rlberry.agents.dynprog.ValueIterationAgent as below:

One-liner:

from rlberry.agents.dynprog import utils
env = ...
qfunc, vfunc, n_iterations = utils.value_iteration(env.R, env.P, gamma=0.99, epsilon=1e-6)

Full example:

import matplotlib.pyplot as plt
from rlberry.envs.finite import GridWorld


env = GridWorld.from_layout(
    """
IOO # OOO
OOO # OOO
OOO O OOO
OOO # OOO
OOO # OOR
"""
)

#
# Alternative 1: using dynprog.utils
#
from rlberry.agents.dynprog import utils

qfunc, vfunc, n_iterations = utils.value_iteration(
    env.R, env.P, gamma=0.99, epsilon=1e-6
)

# Visualize value function
i…

View full answer

omardrwch · 2022-02-17T10:25:47Z

omardrwch
Feb 17, 2022
Maintainer

Hello,

Yes, you can either use the method rlberry.agents.dynprog.utils.value_iteration or the agent rlberry.agents.dynprog.ValueIterationAgent as below:

One-liner:

from rlberry.agents.dynprog import utils
env = ...
qfunc, vfunc, n_iterations = utils.value_iteration(env.R, env.P, gamma=0.99, epsilon=1e-6)

Full example:

import matplotlib.pyplot as plt
from rlberry.envs.finite import GridWorld


env = GridWorld.from_layout(
    """
IOO # OOO
OOO # OOO
OOO O OOO
OOO # OOO
OOO # OOR
"""
)

#
# Alternative 1: using dynprog.utils
#
from rlberry.agents.dynprog import utils

qfunc, vfunc, n_iterations = utils.value_iteration(
    env.R, env.P, gamma=0.99, epsilon=1e-6
)

# Visualize value function
img = env.get_layout_img(vfunc)
plt.figure("dynprog.utils")
plt.imshow(img)

#
# Alternative 2: using ValueIterationAgent
#
from rlberry.agents.dynprog import ValueIterationAgent

agent = ValueIterationAgent(env, gamma=0.99, epsilon=1e-6)
agent.fit()
qfunc_agent, vfunc_agent = agent.Q, agent.V

# visualize
img = env.get_layout_img(vfunc_agent)
plt.figure("ValueIterationAgent")
plt.imshow(img)


# show images
plt.show()

1 reply

ngmq Feb 17, 2022
Author

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to get the optimal value functions out of FiniteMDP environment? #123

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to get the optimal value functions out of FiniteMDP environment? #123

Uh oh!

ngmq Feb 17, 2022

Replies: 1 comment · 1 reply

Uh oh!

omardrwch Feb 17, 2022 Maintainer

Uh oh!

ngmq Feb 17, 2022 Author

ngmq
Feb 17, 2022

Replies: 1 comment 1 reply

omardrwch
Feb 17, 2022
Maintainer

ngmq Feb 17, 2022
Author