Skip to content

Commit 82a22c6

Browse files
committed
Merge branch 'tcmxx/docs'
2 parents ddc9a9f + e0393ff commit 82a22c6

File tree

2 files changed

+17
-13
lines changed

2 files changed

+17
-13
lines changed

Documents/Training-PPO.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -23,14 +23,14 @@ The example [Getting Started with the 3D Balance Ball Environment](Getting-Start
2323
## Explanation of fields in the inspector
2424
We use similar parameters as in Unity ML-Agents. If something is confusing, read see their [document](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Training-PPO.md) for mode datails.
2525

26-
### TrainerPPO.cs
26+
#### TrainerPPO.cs
2727
* `isTraining`: Toggle this to switch between training and inference mode. Note that if isTraining if false when the game starts, the training part of the PPO model will not be initialize and you won't be able to train it in this run. Also,
2828
* `parameters`: You need to assign this field with a TrainerParamsPPO scriptable object.
2929
* `continueFromCheckpoint`: If true, when the game starts, the trainer will try to load the saved checkpoint file to resume previous training.
3030
* `checkpointPath`: the path of the checkpoint, including the file name.
3131
* `steps`: Just to show you the current step of the training.
3232

33-
### TrainerParamsPPO
33+
#### TrainerParamsPPO
3434
* `learningRate`: Learning rate used to train the neural network.
3535
* `maxTotalSteps`: Max steps the trainer will be training.
3636
* `saveModelInterval`: The trained model will be saved every this amount of steps.
@@ -45,12 +45,12 @@ We use similar parameters as in Unity ML-Agents. If something is confusing, read
4545
* `numEpochPerTrain`: For each training, the data in the buffer will be used repeatedly this amount of times.
4646
* `useHeuristicChance`: See [Training with Heuristics](#training-with-heuristics).
4747

48-
### RLModelPPO.cs
48+
#### RLModelPPO.cs
4949
* `checkpointToLoad`: If you assign a model's saved checkpoint file to it, this will be loaded when model is initialized, regardless of the trainer's loading. Might be used when you are not using a trainer.
5050
* `Network`: You need to assign this field with a scriptable object that implements RLNetworkPPO.cs.
5151
* `optimizer`: The time of optimizer to use for this model when training. You can also set its parameters here.
5252

53-
### RLNetworkSimpleAC
53+
#### RLNetworkSimpleAC
5454
This is a simple implementation of RLNetworkAC that you can create a plug it in as a neural network definition for any RLModelPPO. PPO uses actor/critic structure(See PPO algorithm).
5555
- `actorHiddenLayers`/`criticHiddenLayers`: Hidden layers of the network. The array size if the number of hidden layers. In each element, there are for parameters that defines each layer. Those do not have default values, so you have to fill them.
5656
- size: Size of this hidden layer.
@@ -69,6 +69,7 @@ If you already know some policy that is better than random policy, you might giv
6969

7070
Note that your AgentDependentDeicision is only used in training mode. The chance of using it in each step for agent with the script attached depends on `useHeuristicChance`.
7171

72-
72+
## Create your own neural network architecture
73+
If you want to have your own neural network architecture instead of the one provided by [`RLNetworkSimpleAC`](#rlnetworksimpleac), you can inherit `RLNetworkAC` class to build your own neural network. See the [sourcecode](https://github.com/tcmxx/UnityTensorflowKeras/blob/tcmxx/docs/Assets/UnityTensorflow/Learning/PPO/TrainerPPO.cs) of `RLNetworkAC.cs` for documentation.
7374

7475

Documents/Training-SL.md

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ The example scene `UnityTensorflow/Examples/Pong/PongSL` shows how to use superv
1313
3. Assign the Trainer to the `Trainer` field of your Brain.
1414
3. Create a Model
1515
1. Attach a `SupervisedLearningModel.cs` to any GameObject.
16-
2. Create a `SupervisedLearningNetwork` scriptable object in your project and assign it to the Network field in `SupervisedLearningModel.cs`.
16+
2. Create a `SupervisedLearningNetworkSimple` scriptable object in your project and assign it to the Network field in `SupervisedLearningModel.cs`.
1717
3. Assign the created Model to the `modelRef` field of in `TrainerMimic.cs`
1818

1919
4. Create a Decision
@@ -26,7 +26,7 @@ The example scene `UnityTensorflow/Examples/Pong/PongSL` shows how to use superv
2626
* The `isCollectinData` field in trainer needs to be true to collect training data.
2727

2828
## Explanation of fields in the inspector
29-
### TrainerMimic.cs
29+
#### TrainerMimic.cs
3030
* `isTraining`: Toggle this to switch between training and inference mode. Note that if isTraining if false when the game starts, the training part of the PPO model will not be initialize and you won't be able to train it in this run. Also,
3131
* `parameters`: You need to assign this field with a TrainerParamsMimic scriptable object.
3232
* `continueFromCheckpoint`: If true, when the game starts, the trainer will try to load the saved checkpoint file to resume previous training.
@@ -35,7 +35,7 @@ The example scene `UnityTensorflow/Examples/Pong/PongSL` shows how to use superv
3535
* 'isCollectingData': If the training is collecting training data from Agents with Decision.
3636
* `dataBufferCount`: Current collected data count.
3737

38-
### TrainerParamsMimic
38+
#### TrainerParamsMimic
3939
* `learningRate`: Learning rate used to train the neural network.
4040
* `maxTotalSteps`: Max steps the trainer will be training.
4141
* `saveModelInterval`: The trained model will be saved every this amount of steps.
@@ -44,12 +44,12 @@ The example scene `UnityTensorflow/Examples/Pong/PongSL` shows how to use superv
4444
* `requiredDataBeforeTraining`: How many collected data count is needed before it start to traing the neural network.
4545
* `maxBufferSize`: Max buffer size of collected data. If the data buffer count exceeds this number, old data will be overrided. Set this to 0 to remove the limit.
4646

47-
### SupervisedLearningModel.cs
47+
#### SupervisedLearningModel.cs
4848
* `checkpointToLoad`: If you assign a model's saved checkpoint file to it, this will be loaded when model is initialized, regardless of the trainer's loading. Might be used when you are not using a trainer.
4949
* `Network`: You need to assign this field with a scriptable object that implements RLNetworkPPO.cs.
5050
* `optimizer`: The optimizer to use for this model when training. You can also set its parameters here.
5151

52-
### SupervisedLearningNetworkSimple
52+
#### SupervisedLearningNetworkSimple
5353
This is a simple implementation of SuperviseLearningNetowrk that you can create a plug it in as a neural network definition for any SupervisedLearningModel.
5454
- `hiddenLayers`: Hidden layers of the network. The array size if the number of hidden layers. In each element, there are for parameters that defines each layer. Those do not have default values, so you have to fill them.
5555
- size: Size of this hidden layer.
@@ -66,15 +66,15 @@ You can also use a [conditional GAN](https://arxiv.org/abs/1411.1784) model inst
6666

6767
Note that currently the GAN network we made does not support visual observation.
6868

69-
### Steps
69+
#### Steps
7070
Most the same steps as using regular [supervised learning](Overall Steps) as before, but change step 3 to create a GAN model, and change the `TrainerParamsMimic` in step 2-2 to `TrainerParamsGAN` instead.
7171

7272
- Create a GAN model:
7373
1. Attach a `GANModel.cs` to any GameObject.
7474
2. Create a `GANNetworkDense` scriptable object in your project and assign it to the Network field in `GANModel.cs`.
7575
3. Assign the created Model to the `modelRef` field of in `TrainerMimic.cs`
7676

77-
### GANModel.cs
77+
#### GANModel.cs
7878
* `checkpointToLoad`: If you assign a model's saved checkpoint file to it, this will be loaded when model is initialized, regardless of the trainer's loading. Might be used when you are not using a trainer.
7979
* `Network`: You need to assign this field with a scriptable object that implements RLNetworkPPO.cs.
8080
* `generatorL2LossWeight`: L2 loss weight of the generator. Usually 0 is fine.
@@ -85,8 +85,11 @@ Most the same steps as using regular [supervised learning](Overall Steps) as bef
8585
* `discriminatorOptimizer`: The optimizer to use for this model to train discriminator.
8686
* `initializeOnAwake`: Whether to initialize the GAN model on awake baed on shapes defined above. For ML-Agent environment, set this to false.
8787

88-
### TrainerParamsGAN
88+
#### TrainerParamsGAN
8989
See [TrainerParamsMimic](#trainerparamsmimic) for other parameters not listed below.
9090
* `discriminatorTrainCount`: How many times the discriminator will be trained each training step.
9191
* `generatorTrainCount`: How many times the generator will be trained each training step.
9292
* `usePrediction`: Whether use [prediction method](https://www.semanticscholar.org/paper/Stabilizing-Adversarial-Nets-With-Prediction-Yadav-Shah/ec25504486d8751e00e613ca6fa64b256e3581c8) to stablize the training.
93+
94+
## Create your own neural network architecture
95+
If you want to have your own neural network architecture instead of the one provided by [`SupervisedLearningNetworkSimple`](#supervisedlearningnetworksimple), you can inherit `SupervisedLearningNetwork` class to build your own neural network. See the [sourcecode](https://github.com/tcmxx/UnityTensorflowKeras/blob/tcmxx/docs/Assets/UnityTensorflow/Learning/Mimic/SupervisedLearningNetwork.cs) of `SupervisedLearningNetwork.cs` for documentation.

0 commit comments

Comments
 (0)