This project involves training a car agent using Unity ML-Agents to navigate a straight road while avoiding walls. The agent uses raycasts to detect obstacles and receives rewards for staying centered and moving forward. Below is a detailed explanation of the movement logic, agent logic, functions, observations, and overall setup.
- Movement Logic
- Agent Logic
- Functions
- Observations
- Reward System
- Training Setup
- Testing Setup
- How to Use
The car agent moves based on three actions:
- Acceleration: Controls forward and backward movement.
- Positive input: Accelerate forward.
- Negative input: Reverse.
- Steering: Controls left and right movement.
- Positive input: Steer right.
- Negative input: Steer left.
- Handbrake: Stops the car abruptly.
The car’s movement is physics-based, using Unity’s Rigidbody
component. The steering angle is clamped between -25
and 25
degrees to prevent unrealistic turns.
The agent uses raycasts to detect walls and obstacles. It receives observations about its environment and takes actions based on the trained model or heuristic input.
-
Raycasts:
- Distances:
{ 6f, 6f, 4.6f, 4.6f }
- Angles:
{ -30f, 30f, -90f, 90f }
- Detects walls and calculates rewards based on proximity.
- Distances:
-
Observations:
- Speed, steering angle, velocity, and raycast distances.
-
Rewards:
- Positive for moving forward.
- Negative for being near walls or hitting obstacles.
- Small penalty for staying stuck.
Here’s a breakdown of the key functions in the CarAgent
script:
- Initializes the agent’s starting position, rotation, and raycast rewards array.
- Resets the agent’s position, velocity, and rewards at the start of each episode.
- Collects observations for the agent:
- Normalized speed and steering angle.
- Velocity (x and z components).
- Distance from the center of the road.
- Raycast distances to walls.
- Processes actions (acceleration, steering, handbrake).
- Moves the car using
Rigidbody
. - Applies rewards based on movement and raycast collisions.
- Maps keyboard input to actions for manual control:
- W/S or Up/Down Arrow: Acceleration.
- A/D or Left/Right Arrow: Steering.
- Spacebar: Handbrake.
- Resets the episode if the car goes off the road.
- Resets the episode if the car collides with a wall.
- Draws raycasts in the Scene view for debugging.
The agent observes the following:
- Speed: Normalized current speed.
- Steering Angle: Normalized current steering angle.
- Velocity: X and Z components of the car’s velocity.
- Distance from Center: Distance from the center of the road.
- Raycast Distances: Normalized distances to walls detected by raycasts.
The agent receives rewards based on its actions and environment:
- Positive Rewards:
+0.01
for moving forward.
- Negative Rewards:
-0.1 * (1 - normalizedDistance)
for being near a wall.-1.0
for hitting a wall or going off the road.-0.01
for staying stuck (no movement for 3 seconds).
-
Environment:
- Straight road with walls on both sides.
- Agent starts at the beginning of the road.
-
YAML Configuration:
behaviors: CarBehavior: trainer_type: ppo hyperparameters: batch_size: 64 buffer_size: 1024 learning_rate: 3.0e-4 beta: 5.0e-4 epsilon: 0.2 lambd: 0.95 num_epoch: 3 learning_rate_schedule: linear network_settings: normalize: true hidden_units: 128 num_layers: 2 reward_signals: extrinsic: gamma: 0.99 strength: 1.0 max_steps: 500000 time_horizon: 64 summary_freq: 10000
-
Training Command:
mlagents-learn car_config.yaml --run-id=CarTraining --force
-
Environment:
- Longer road (20 units) with narrower width (4 units).
- Obstacles placed randomly on the road.
-
Behavior Parameters:
- Assign the trained
.onnx
file. - Set Behavior Type to
Inference Only
.
- Assign the trained
-
Evaluation Metrics:
- Success rate, collision rate, and average reward.
-
Training:
- Set up the training environment.
- Run the training command and press Play in Unity.
-
Testing:
- Set up the testing environment.
- Assign the trained model and observe the agent’s performance.
-
Manual Control:
- Set Behavior Type to
Heuristic Only
. - Use the keyboard to control the car.
- Set Behavior Type to
- Add curved roads and dynamic obstacles.
- Train with multiple environments for better generalization.
- Use Cinemachine for better visualization during testing.