Replies: 1 comment 1 reply
-
Hi @carlulxy The
|
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
In the PPO algorithm, the code provides a shared model to compute the action or value from the input["state"]. I have some questions about the shape of input["state]. I realized that
input["state"].shape[0] = env.num_envs
is not always true. Sometimesinput["state"].shape[0] = 2*env.num_envs
. What does this mean? Is it a continuous multi-frame state? And ifinput["state"].shape[0] = 2*env.num_envs
, then the shape[0] of action computed is also changed to be 2*env.num_envs. Then how is this action applied to render the environment?I found this out when I wanted to cache the multi-frame state as an input to the model.
Beta Was this translation helpful? Give feedback.
All reactions