-
Hi, is there a one-liner for getting the optimal Q* and V* of an environment after specifying the reward and transition functions (env.R and env.P)? |
Beta Was this translation helpful? Give feedback.
Answered by
omardrwch
Feb 17, 2022
Replies: 1 comment 1 reply
-
Hello, Yes, you can either use the method One-liner: from rlberry.agents.dynprog import utils
env = ...
qfunc, vfunc, n_iterations = utils.value_iteration(env.R, env.P, gamma=0.99, epsilon=1e-6) Full example: import matplotlib.pyplot as plt
from rlberry.envs.finite import GridWorld
env = GridWorld.from_layout(
"""
IOO # OOO
OOO # OOO
OOO O OOO
OOO # OOO
OOO # OOR
"""
)
#
# Alternative 1: using dynprog.utils
#
from rlberry.agents.dynprog import utils
qfunc, vfunc, n_iterations = utils.value_iteration(
env.R, env.P, gamma=0.99, epsilon=1e-6
)
# Visualize value function
img = env.get_layout_img(vfunc)
plt.figure("dynprog.utils")
plt.imshow(img)
#
# Alternative 2: using ValueIterationAgent
#
from rlberry.agents.dynprog import ValueIterationAgent
agent = ValueIterationAgent(env, gamma=0.99, epsilon=1e-6)
agent.fit()
qfunc_agent, vfunc_agent = agent.Q, agent.V
# visualize
img = env.get_layout_img(vfunc_agent)
plt.figure("ValueIterationAgent")
plt.imshow(img)
# show images
plt.show() |
Beta Was this translation helpful? Give feedback.
1 reply
Answer selected by
ngmq
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hello,
Yes, you can either use the method
rlberry.agents.dynprog.utils.value_iteration
or the agentrlberry.agents.dynprog.ValueIterationAgent
as below:One-liner:
Full example: