rl documentation

thowell · thowell · commit 97e752cb6c9f · 2022-02-28T13:48:08.000-08:00
diff --git a/docs/make.jl b/docs/make.jl
@@ -1,6 +1,6 @@
 push!(LOAD_PATH, "../src/")
 
-using Documenter#, Dojo
+using Documenter, Dojo
 
 makedocs(
     modules = [Dojo],
diff --git a/docs/src/reinforcement_learning.md b/docs/src/reinforcement_learning.md
@@ -1,5 +1,25 @@
 # Reinforcement Learning
 
+Policy optimization is performed using the reinforcement-learning algorithm [augmented random search (ARS)](https://arxiv.org/abs/1803.07055) to optimize static linear policies for locomotion. A number of [Gym-like environments](https://gym.openai.com/) are created with Dojo for this application.
+
+## [Half Cheetah](../../examples/reinforcement_learning/halfcheetah_ars.jl)
+
+```@raw html
+<img src="../../examples/animations/halfcheetah_ars.gif" width="600"/>
+```
+
+The cheetah-like robot has rewards on forward velocity and costs on control usage. 
+
+## [Ant](../../examples/reinforcement_learning/ant_ars.jl) 
+
+```@raw html
+<img src="../../examples/animations/ant_ars_no_grid.gif" width="300"/>
+```
+
+The insect-like robot has rewards on forward velocity and survival and costs on control usage and contact forces.
+
 ## Gradient-Free
+The original ARS is a derivative-free algorithm and only utilizes the relative costs for each sample policy in order to assemble a new search direction for the policy parameters. While this approach is general, it is potentially very sample inefficient.
 
-## Gradient-Based
+## Gradient-Based
+We formulate a gradient-based version, augmented gradient search (AGS), that utilizes gradients from Dojo. We find that by using this information the policy optimization can often be an order of magntitude more sample efficient in terms of queries to the simulator.