Merge branch 'main' of https://github.com/DojoSim/Dojo.jl into main

simon-lc · simon-lc · commit 017d3ef925e5 · 2022-02-28T14:27:49.000-08:00
diff --git a/docs/make.jl b/docs/make.jl
@@ -1,6 +1,6 @@
 push!(LOAD_PATH, "../src/")
 
-using Documenter#, Dojo
+using Documenter, Dojo
 
 makedocs(
     modules = [Dojo],
diff --git a/docs/src/reinforcement_learning.md b/docs/src/reinforcement_learning.md
@@ -1,5 +1,25 @@
 # Reinforcement Learning
 
+Policy optimization is performed using the reinforcement-learning algorithm [augmented random search (ARS)](https://arxiv.org/abs/1803.07055) to optimize static linear policies for locomotion. A number of [Gym-like environments](https://gym.openai.com/) are created with Dojo for this application.
+
+## [Half Cheetah](../../examples/reinforcement_learning/halfcheetah_ars.jl)
+
+```@raw html
+<img src="../../examples/animations/halfcheetah_ars.gif" width="600"/>
+```
+
+The cheetah-like robot has rewards on forward velocity and costs on control usage. 
+
+## [Ant](../../examples/reinforcement_learning/ant_ars.jl) 
+
+```@raw html
+<img src="../../examples/animations/ant_ars_no_grid.gif" width="300"/>
+```
+
+The insect-like robot has rewards on forward velocity and survival and costs on control usage and contact forces.
+
 ## Gradient-Free
+The original ARS is a derivative-free algorithm and only utilizes the relative costs for each sample policy in order to assemble a new search direction for the policy parameters. While this approach is general, it is potentially very sample inefficient.
 
-## Gradient-Based
+## Gradient-Based
+We formulate a gradient-based version, augmented gradient search (AGS), that utilizes gradients from Dojo. We find that by using this information the policy optimization can often be an order of magntitude more sample efficient in terms of queries to the simulator.
diff --git a/docs/src/trajectory_optimization.md b/docs/src/trajectory_optimization.md
@@ -8,26 +8,36 @@ Dojo provides dynamics constraints and Jacobians in order to perform trajectory
 <img src="../../examples/animations/quadruped_min.gif" width="200"/>
 ```
 
+A [Unitree A1](https://www.unitree.com/products/a1/) takes a number of forward steps. There are costs on a kinematic gait and control usage, as well as an augmented Lagrangian (i.e., soft) constraint on the the robot's final pose. The maximal representation is converted to a minimal one for optimization. Additionally, slack controls are utilized early on to aid the optimizer before being driven to zero by a constraint to achieve a dynamically feasible trajectory.
+
 ## [Atlas](../../examples/trajectory_optimization/atlas_min.jl)
 
 ```@raw html
 <img src="../../examples/animations/atlas_ilqr.gif" width="200"/>
 ```
 
+The Atlas v5 humanoid (sans arms) takes a number of forward steps. Similar to the quadruped example, there are costs on control effort and deviations from a kinematic plan, a minimal representation is utilized, and the optimizer is aided by slack controls.
+
 ## [Block](../../examples/trajectory_optimization/box.jl)
 
 ```@raw html
 <img src="../../examples/animations/box_right.gif" width="200"/>
 ```
 
+A block is moved to a goal location by applying forces to its center of mass. The optimizer is initialized with zero control and utilizes smooth gradients from Dojo to find a motion that overcomes friction to slide towards the goal.
+
 ## [Raibert Hopper](../../examples/trajectory_optimization/hopper_max.jl)
 
 ```@raw html
 <img src="../../examples/animations/hopper_max.gif" width="100"/>
 ```
 
+A hopping robot, inspired by the [Raibert Hopper](https://dspace.mit.edu/handle/1721.1/6820), is tasked with moving to a goal location. The optimizer finds a single hop trajectory to reach its goal pose.
+
 ## [Cart-pole](../../examples/trajectory_optimization/cartpole_max.jl)
 
 ```@raw html
 <img src="../../examples/animations/cartpole_max.gif" width="200"/>
 ```
+
+This classic system is tasked with performing a swing-up. Examples are provided performing optimization with both maximal and minimal representations.