-
Notifications
You must be signed in to change notification settings - Fork 25
Deep Reinforcement Learning using Microsoft CNTK
The C++ API
of CNTK
is documented inside the CNTKLibrary.h
header.
The basic idea of CNTK
is, that every operation is stored inside a FunctionPtr
object. These functions can have multiple inputs (of the type FunctionPtr
as well) and can give multiple outputs, too. Furthermore, CNTK
introduces the class Variable
which inherits the Function
class. It is used to represent input and output data inside the graph of the CNTK
model. The FunctionPtr
pointers have a list of variables (Arguments()
), containing all Variables
which are required to evaluate this function.
During the creation of a FunctionPtr
the input FunctionPtrs
have to be submitted as an argument of the constructor.
The neural networks consist of different settings: the general structure/architecture, the hyperparameters of specific layers and the optimizer to train the network. To set this things, the usage of API
s (e.g. in python) of special deep learning frameworks is often needed. To ease the creation of deep RL experiments in RLSimion
the NNCreator project has been integrated into Badger
and the RLSimion-Lib
.
The integration in Badger offers an easy to understand user interface allowing (unexperienced) users to create network architectures easily. Nevertheless, there are almost no restrictions regarding the network architecture, so that is just as well possible to generate more complex models. Furthermore, no knowledge of specific frameworks like cntk
, keras
or tensorflow
is required. To achieve this, a novel data format to store network architectures and hyperparameters has been developed. This allows serializing the model from inside C#
into a XML
representation which can then be included in to experiment settings, created by Badger
.
Inside of the RLSimion-Lib
these settings are then read and parsed inside the NNCreator
project, which servers as a wrapper between RLSimion
and the CNTK API
.
The describe of a neural network and its design a summarized inside a Problem
object:
It is also possible to create non-sequential models like residual networks inside of this Chain
design by linking and merging different chains inside a new Chain
.
At the moment you can use theLinks
shown below. As most of the names are self explanatory there is no further documentation about these available. All of them derive from the LinkBase
class.
The BaseLink
class provides a Collection<ParameterBase>
called Parameters
in which all parameters of each Link
are stored. Each parameter has a Name
(which should be unique inside the collection of each Link
) and a Value
of the generic type T
. At the moment the following parameter types are available and required by the links.
In total you arrive at this UML diagram.
This part of the code will load the settings from the previously described data format, create a representation of the model/graph and will translate this into a CNTK
graph. The loading of the settings follows the same general idea according to which all settings are loaded inside of RLSimion
.
After the settings have been loaded and the graph consisting of custom object has been created, the CNTK
graph will be created traversing the model. The required functions for this are stored inside the CNTKWrapper.h/CNTKWrapper.cpp
file. This allows one to later add support for different frameworks, too, just by abstracting the code inside of this file to new frameworks.
To use CNTK
's neural networks in algorithms inside of RLSimion
one has to use the parameter type NEURAL_NETWORK_PROBLEM_DESCRIPTION
. This acts as the interface between the experiment's settings file and the C++
code inside RLSimion
. The actual creation of the network is not performed immediatley after loading the settings, but is delayed until the network is used for the first time. You can access the network's CNetwork
object by calling getNetwork()
, which will create the neccessary CNTK
tensors if this has not been done beforehand.
At the moment there are some algorithms for discrete and for continous problems integrated. The algorithms for discrete actions also have support for different policies (epsilon-greedy
, softmax
, ...).
-
Discrete:
- DQN
- Asynchronous one-step Q-Learning (with only one thread at the moment)
-
Continuous:
- DDPG