Each experiment uses 3 seeds and is trained for 2M environment steps. The parameters used for DDPG are the same parameters as described in the original paper.
coach -p Mujoco_DDPG -lvl inverted_pendulum
coach -p Mujoco_DDPG -lvl inverted_double_pendulum
coach -p Mujoco_DDPG -lvl reacher
coach -p Mujoco_DDPG -lvl hopper
coach -p Mujoco_DDPG -lvl half_cheetah
coach -p Mujoco_DDPG -lvl walker2d
coach -p Mujoco_DDPG -lvl ant
coach -p Mujoco_DDPG -lvl swimmer
coach -p Mujoco_DDPG -lvl humanoid