You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When training on hardware, at some random point in traning (not the same every time) the episodes suddenly starts terminating after one step and the batch of 2048 steps suddenly becomes 2048 individual episodes of 1 step each, naturally always yielding 0 reward (for the QubeSwingupEnv)