Skip to content

Commit 017d3ef

Browse files
committed
Merge branch 'main' of https://github.com/DojoSim/Dojo.jl into main
2 parents d6c35bd + 97e752c commit 017d3ef

File tree

3 files changed

+32
-2
lines changed

3 files changed

+32
-2
lines changed

docs/make.jl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
push!(LOAD_PATH, "../src/")
22

3-
using Documenter#, Dojo
3+
using Documenter, Dojo
44

55
makedocs(
66
modules = [Dojo],

docs/src/reinforcement_learning.md

Lines changed: 21 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,25 @@
11
# Reinforcement Learning
22

3+
Policy optimization is performed using the reinforcement-learning algorithm [augmented random search (ARS)](https://arxiv.org/abs/1803.07055) to optimize static linear policies for locomotion. A number of [Gym-like environments](https://gym.openai.com/) are created with Dojo for this application.
4+
5+
## [Half Cheetah](../../examples/reinforcement_learning/halfcheetah_ars.jl)
6+
7+
```@raw html
8+
<img src="../../examples/animations/halfcheetah_ars.gif" width="600"/>
9+
```
10+
11+
The cheetah-like robot has rewards on forward velocity and costs on control usage.
12+
13+
## [Ant](../../examples/reinforcement_learning/ant_ars.jl)
14+
15+
```@raw html
16+
<img src="../../examples/animations/ant_ars_no_grid.gif" width="300"/>
17+
```
18+
19+
The insect-like robot has rewards on forward velocity and survival and costs on control usage and contact forces.
20+
321
## Gradient-Free
22+
The original ARS is a derivative-free algorithm and only utilizes the relative costs for each sample policy in order to assemble a new search direction for the policy parameters. While this approach is general, it is potentially very sample inefficient.
423

5-
## Gradient-Based
24+
## Gradient-Based
25+
We formulate a gradient-based version, augmented gradient search (AGS), that utilizes gradients from Dojo. We find that by using this information the policy optimization can often be an order of magntitude more sample efficient in terms of queries to the simulator.

docs/src/trajectory_optimization.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,26 +8,36 @@ Dojo provides dynamics constraints and Jacobians in order to perform trajectory
88
<img src="../../examples/animations/quadruped_min.gif" width="200"/>
99
```
1010

11+
A [Unitree A1](https://www.unitree.com/products/a1/) takes a number of forward steps. There are costs on a kinematic gait and control usage, as well as an augmented Lagrangian (i.e., soft) constraint on the the robot's final pose. The maximal representation is converted to a minimal one for optimization. Additionally, slack controls are utilized early on to aid the optimizer before being driven to zero by a constraint to achieve a dynamically feasible trajectory.
12+
1113
## [Atlas](../../examples/trajectory_optimization/atlas_min.jl)
1214

1315
```@raw html
1416
<img src="../../examples/animations/atlas_ilqr.gif" width="200"/>
1517
```
1618

19+
The Atlas v5 humanoid (sans arms) takes a number of forward steps. Similar to the quadruped example, there are costs on control effort and deviations from a kinematic plan, a minimal representation is utilized, and the optimizer is aided by slack controls.
20+
1721
## [Block](../../examples/trajectory_optimization/box.jl)
1822

1923
```@raw html
2024
<img src="../../examples/animations/box_right.gif" width="200"/>
2125
```
2226

27+
A block is moved to a goal location by applying forces to its center of mass. The optimizer is initialized with zero control and utilizes smooth gradients from Dojo to find a motion that overcomes friction to slide towards the goal.
28+
2329
## [Raibert Hopper](../../examples/trajectory_optimization/hopper_max.jl)
2430

2531
```@raw html
2632
<img src="../../examples/animations/hopper_max.gif" width="100"/>
2733
```
2834

35+
A hopping robot, inspired by the [Raibert Hopper](https://dspace.mit.edu/handle/1721.1/6820), is tasked with moving to a goal location. The optimizer finds a single hop trajectory to reach its goal pose.
36+
2937
## [Cart-pole](../../examples/trajectory_optimization/cartpole_max.jl)
3038

3139
```@raw html
3240
<img src="../../examples/animations/cartpole_max.gif" width="200"/>
3341
```
42+
43+
This classic system is tasked with performing a swing-up. Examples are provided performing optimization with both maximal and minimal representations.

0 commit comments

Comments
 (0)