Continous Action Space outputs are unbounded #329
-
Hello everyone, I'm currently using Isaac Lab and SKRL for a Continous Action Space task where the agent moves my tooltip by a delta based on sensor collision feedback. I use SKRL with GaussianMixin model and observed that my action output is always unbounded. Action Space Defintion: SKRL GaussianMixin:
Main Script in Isaac Lab:
Output:
Here when I print picked actions, the output is always unbounded. Since I already do tanh squash, I again map it to my action space high and low. I would expect my picked action to be bounded in [-1,1] but that's not the case. I'm not an RL expert and is unsure if this would damage my training. Is there any other approach to do this cleanly? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
Hi @Sanjay1911 The skrl's Gaussian model concept represents the data flow when using such a mixin: The If you want to limit the final output of the Gaussian model, you need to enable |
Beta Was this translation helpful? Give feedback.
-
Thank you @Toni-SM for your detailed answer. I already tried reducing log_std and was able to get decent values in my desired range. I also wanted to ask another question which is not specific to SKRL but RL in general as I couldn't find the information elsewhere. Is it generally advisable in RL to pause the action space with a flag when the observations are still running? Meaning I've a task where I get a Delta offset to my initial position (Action of the agent) and once this is known, I insert a needle - gather sensor data - calculate rewards. But during this cycle, I dont want my actions to be updated since that won't let me finish the insertion completely and would move my tool from the current new position.
In this I basically put a flag and stop the apply_action update (which takes in action outputs from pre_physics_step). But my post-physics calls are still used in training.From what I understood especially in Actor-Critic networks, where the actions are taken every sim step - my task doesn't suit this setup. Any advice/suggestion would be of immense help. |
Beta Was this translation helpful? Give feedback.
Hi @Sanjay1911
The skrl's Gaussian model concept represents the data flow when using such a mixin:
The
mean
(in the interval [-1,1] due do thetanh
output) feeds a Gaussian distribution. Such distribution, when thelog_std
is not 0, samples values not bounded in the limits [-1,1], as discussed in #63 (comment) (I recommend you take a look at this discussion).If you want to limit the final output of the Gaussian model, you need to enable
clip_actions
while defining a bounded Gymnasium space (action_space = spaces.Box(low=-1, high=1, shape=(2,))
) letting the application of the scale (0.0005) to the task implementation.