This project implements a Proximal Policy Optimization (PPO) reinforcement learning agent to train the Minitaur robot to walk in the MinitaurBulletEnv-v0 environment using PyBullet. The agent uses a multilayer perceptron (MLP) to model the policy and value networks and learns to control the robot in a continuous action space.
-
Updated
Jul 16, 2025 - Python