Skip to content

Commit 97e752c

Browse files
committed
rl documentation
1 parent 7e108cd commit 97e752c

File tree

2 files changed

+22
-2
lines changed

2 files changed

+22
-2
lines changed

docs/make.jl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
push!(LOAD_PATH, "../src/")
22

3-
using Documenter#, Dojo
3+
using Documenter, Dojo
44

55
makedocs(
66
modules = [Dojo],

docs/src/reinforcement_learning.md

Lines changed: 21 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,25 @@
11
# Reinforcement Learning
22

3+
Policy optimization is performed using the reinforcement-learning algorithm [augmented random search (ARS)](https://arxiv.org/abs/1803.07055) to optimize static linear policies for locomotion. A number of [Gym-like environments](https://gym.openai.com/) are created with Dojo for this application.
4+
5+
## [Half Cheetah](../../examples/reinforcement_learning/halfcheetah_ars.jl)
6+
7+
```@raw html
8+
<img src="../../examples/animations/halfcheetah_ars.gif" width="600"/>
9+
```
10+
11+
The cheetah-like robot has rewards on forward velocity and costs on control usage.
12+
13+
## [Ant](../../examples/reinforcement_learning/ant_ars.jl)
14+
15+
```@raw html
16+
<img src="../../examples/animations/ant_ars_no_grid.gif" width="300"/>
17+
```
18+
19+
The insect-like robot has rewards on forward velocity and survival and costs on control usage and contact forces.
20+
321
## Gradient-Free
22+
The original ARS is a derivative-free algorithm and only utilizes the relative costs for each sample policy in order to assemble a new search direction for the policy parameters. While this approach is general, it is potentially very sample inefficient.
423

5-
## Gradient-Based
24+
## Gradient-Based
25+
We formulate a gradient-based version, augmented gradient search (AGS), that utilizes gradients from Dojo. We find that by using this information the policy optimization can often be an order of magntitude more sample efficient in terms of queries to the simulator.

0 commit comments

Comments
 (0)