Shape[0] of inputs["states"] in shared model of PPO #341

carlulxy · 2025-07-30T07:59:28Z

carlulxy
Jul 30, 2025

In the PPO algorithm, the code provides a shared model to compute the action or value from the input["state"]. I have some questions about the shape of input["state]. I realized that input["state"].shape[0] = env.num_envs is not always true. Sometimes input["state"].shape[0] = 2*env.num_envs. What does this mean? Is it a continuous multi-frame state? And if input["state"].shape[0] = 2*env.num_envs, then the shape[0] of action computed is also changed to be 2*env.num_envs. Then how is this action applied to render the environment?

I found this out when I wanted to cache the multi-frame state as an input to the model.

Toni-SM · 2025-07-30T14:08:17Z

Toni-SM
Jul 30, 2025
Maintainer

Hi @carlulxy

The .shape[0] value depends on the PPO algorithm execution:

during rollouts (collecting env transitions) the shape[0] value is ensured to be env.num_envs
during algorithm update (learning process) the shape[0] value could be env.num_envs or rollouts * env.num_envs / mini_batches (in this last case. if the division is not exact, the mini-bash may have a different shape[0] value)

1 reply

carlulxy Jul 31, 2025
Author

Thank you for you reply！
That is, even when .shape[0] is not env.num_envs, it doesn't represent the state of multiple consecutive frames, but rather multiple states (not necessarily consecutive) extracted from Experience Replay Buffer, right?
And as i have mentioned in another discussion #342, i want use RL to optimize the diffusion model that takes the state of multiple consecutive frames as input . So I have to do the processing in get_observation of the task class, right?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Shape[0] of inputs["states"] in shared model of PPO #341

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Shape[0] of inputs["states"] in shared model of PPO #341

Uh oh!

carlulxy Jul 30, 2025

Replies: 1 comment · 1 reply

Uh oh!

Uh oh!

Toni-SM Jul 30, 2025 Maintainer

Uh oh!

carlulxy Jul 31, 2025 Author

carlulxy
Jul 30, 2025

Replies: 1 comment 1 reply

Toni-SM
Jul 30, 2025
Maintainer

carlulxy Jul 31, 2025
Author