Skip to content

Commit 2ec8bb2

Browse files
authored
document experiments and results
1 parent a646aec commit 2ec8bb2

File tree

1 file changed

+81
-1
lines changed

1 file changed

+81
-1
lines changed

README.md

Lines changed: 81 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,87 @@
11
# ActiveRagdollAssaultCourse
22
Research into Assault Course for training Active Ragdolls (using MujocoUnity+ml_agents)
33

4+
#### Contributors
5+
* Joe Booth ([SohoJoe](https://github.com/Sohojoe))
6+
* Jackson Booth ([JacksonJabba](https://github.com/jacksonJabba))
7+
8+
## AssaultCourse004-Walker
9+
![AssaultCourse004-Walker](images/AssaultCourse004Walker.202-10m.gif)
10+
* **Mujoco Model:** DeepMindWalker
11+
* **Hypostheis:** Adverserial trainer will scale to more complex models
12+
* **Outcome:** Worked well.
13+
* **Raw Notes:**
14+
* July 27 2018: Compared recurrent vs no recurrent over 1m training steps with two training runs each. **Recurrent was 25% improvement over non-recurrent** (recurrent score 440 & 424 so 432. Non-recurrent score 357 & 331 so 344) Note: is slower to train
15+
* Non recurrent vs with recurrent TODO ADD IMAGE
16+
* July 26 2018: Walker004.204 / 205 - train with recurrent as a comparison (205 is a sanity check)
17+
* Walker004.203 / 206 - train with no recurrent on terrainBrain for 1m steps
18+
* Walker004.202 - trained 8m, 10m
19+
* TODO - add image
20+
* July 25 2018: Walker004.202 - trained 5m
21+
* Walker004.202 - reverted double velocity
22+
* Walker004.201 - double velocity did not work out well
23+
* July 24 2018: Walker004 - made reward function double velocity reward after falling over,
24+
25+
26+
## AssaultCourse004-Hopper
27+
![AssaultCourse004-Hopper](images/AssaultCourse004.203.gif)
28+
* **Mujoco Model:** DeepMindHopper
29+
* **Hypostheis:** Dynamically adapting terrain will improve robustness and performance. Compare hand coded logic, vs random, vs adverserial network
30+
* **Outcome:** Adverserial network (using inverse reward from hopper brain) is less complex and signifcantly more effective.
31+
* **Raw Notes:**
32+
* July 24 2018: 004.203 - use an adversarial network for the 2nd trainer - **worked really well!!**
33+
* TODO ADD image Compared to 202 Add image
34+
* July 23 2018: 004.202 - training for 5m steps (no difference in reward for up and down)
35+
* 004.16 - fixed some small issues with .15 logic
36+
* 004.15 - 0.25 pain if on knee or bum, 1 pain if head or noise. If pain, no upright bonus
37+
* 004.14 - fix, ‘Remove knee collision’ - was not working properly. Outcome: gets stuck on knee - try pain
38+
* July 22 2018: 004.13 - trying default learning rate - note: this worked (i.e. trained 1m steps without needing 100k first)
39+
* 004.12 Added ‘Spawn terrain further ahead so that it has more chance to prepare’
40+
* 004.11 - trying 100k then 1m steps worked. Not sure its much better than .09 (could try training for 5m steps, but will try extra gap
41+
* 004.10 - tried training to 1m steps and the guy just stands there.
42+
* Added ‘Remove knee collision’
43+
* Added ‘Higher reward for going upward’ - using 2x
44+
* 004.09 - is training pretty well, focuses on jumping down (as actor finds it harder to jump up) - tried 1m steps and 5m steps. Try
45+
* Remove knee collision
46+
* Spawn terrain further ahead so that it has more chance to prepare
47+
* Higher reward for going upward
48+
* July 21 004.01 -
49+
50+
51+
52+
53+
## AssaultCourse003
54+
![AssaultCourse003](images/AssaultCourse003.gif)
55+
* **Mujoco Model:** DeepMindHopper
56+
* **Hypostheis:** Integrating perception of height will improve training
57+
* **Outcome:** hopper is aware of and adapts to the terain.
58+
* **References:**
59+
* Insperation for observations is from DeepMind paper: [Emergence of Locomotion Behaviours in Rich Environments. arXiv:1707.02286 [cs.AI]](https://arxiv.org/abs/1707.02286) see: B Additional experimental details
60+
* **Raw Notes:**
61+
* July 20: 003.33 - make height perception relative to foot (hypothesis: this will help it learn faster as heights will be relative) - **worked**
62+
* July 18: 003.32 uprightBonus *= .3f; - **failed**
63+
* 003.31 uprightBonus *= .3f; - **failed**
64+
* 003.30 normal reward for 1m steps
65+
* 003.29 no uprightBonus
66+
* 003.27 uprightBonus *= Mathf.Clamp(velocity,0,1);
67+
68+
69+
## AssaultCourse002
70+
* **Mujoco Model:** DeepMindHopper
71+
* **Hypostheis:** Agent should be able to learn to traverse box objects **without** adding perception of height.
72+
* **Outcome:** Did not work - Objects may have being too high.
73+
* **Raw Notes:**
74+
* July 4: Hop 8 - Tried blockyer obstacles to see their reaction. Hoppers couldn't get over obstacles, couldn't jump over them, will try with varying heights to see if that can teach them to jump over.
75+
76+
477
## AssaultCourse001
78+
![AssaultCourse001](images/AssaultCourse001.gif)
579
* **Mujoco Model:** DeepMindHopper
6-
* **Hypostheis:** Agent should be able to learn to traverse simple slopes without adding perception of height.
80+
* **Hypostheis:** Agent should be able to learn to traverse simple slopes **without** adding perception of height.
781
* **Outcome:** It worked - seams that it helps to have different slopes across the 16 agents. We needed a couple of iterations to get it right.
82+
* **Raw Notes:**
83+
* July 4: Hop 6 - tried disabling curiosity to see if it was necessary for training hoppers, they were almost as effective as hop 5 but not quite although they trained significantly faster.
84+
* Hop 7 - With curiosity disabled I test the hoppers where some have obstacles and some don't. May make them faster while still retaining ability to maneuver obstacles. Just took longer to train, didn't have an effect on actual speed of hopper
85+
* Hop1.6 - tried using brain for hop 1 to see how it’d react and adapt to new obstacle environment, may lead to faster speeds. Wasn’t even slightly effective. The hoppers fell when they reached obstacles.
86+
* July 3: Hop4 - first training with obstacles. Didn’t react well with environment, scooted over bumpy terrain and hills.
87+
* Hop5 - added longer flat stretch before obstacles, may lead to better results when facing difficult terrain. Was quite effective at difficult terrain although not as fast as hop 1 or 2.

0 commit comments

Comments
 (0)