Skip to content

Deep Reinforcement Learning using Microsoft CNTK

Roland Zimmermann edited this page Sep 29, 2017 · 8 revisions

Introduction to CNTK

Internal Structure

The C++ API of CNTK is documented inside the CNTKLibrary.h header.

The basic idea of CNTK is, that every operation is stored inside a FunctionPtr object. These functions can have multiple inputs (of the type FunctionPtr as well) and can give multiple outputs, too. Furthermore, CNTK introduces the class Variable which inherits the Function class. It is used to represent input and output data inside the graph of the CNTK model. The FunctionPtr pointers have a list of variables (Arguments()), containing all Variables which are required to evaluate this function.

During the creation of a FunctionPtr the input FunctionPtrs have to be submitted as an argument of the constructor.

Architecture design

The neural networks consist of different settings: the general structure/architecture, the hyperparameters of specific layers and the optimizer to train the network. To set this things, the usage of APIs (e.g. in python) of special deep learning frameworks is often needed. To ease the creation of deep RL experiments in RLSimion the NNCreator project has been integrated into Badger and the RLSimion-Lib.

The integration in Badger offers an easy to understand user interface allowing (unexperienced) users to create network architectures easily. Nevertheless, there are almost no restrictions regarding the network architecture, so that is just as well possible to generate more complex models. Furthermore, no knowledge of specific frameworks like cntk, keras or tensorflow is required. To achieve this, a novel data format to store network architectures and hyperparameters has been developed. This allows serializing the model from inside C# into a XML representation which can then be included in to experiment settings, created by Badger.

Inside of the RLSimion-Lib these settings are then read and parsed inside the NNCreator project, which servers as a wrapper between RLSimion and the CNTK API.

Data format

The describe of a neural network and its design a summarized inside a Problem object:

This class defines the optimizer (`OptimizerSetting`), the `Inputs, the `Output` and the architecture (`NetworkArchitecture). This network architecture is organized in the following way: A sequence of consecutive operations (also known as layers) is called a `Chain`. The elements of each `Chain` (also known as layer) are called a `Link`. A `NetworkArchitecture` can consist out of multiple `Chains`, but has to contain at least one.

It is also possible to create non-sequential models like residual networks inside of this Chain design by linking and merging different chains inside a new Chain.

At the moment you can use theLinks shown below. As most of the names are self explanatory there is no further documentation about these available. All of them derive from the LinkBase class.

UML

The BaseLinkclass provides a Collection<ParameterBase> called Parameters in which all parameters of each Link are stored. Each parameter has a Name (which should be unique inside the collection of each Link) and a Value of the generic type T. At the moment the following parameter types are available and required by the links.

UML


In total you arrive at this UML diagram.

C++ library

This part of the code will load the settings from the previously described data format, create a representation of the model/graph and will translate this into a CNTK graph. The loading of the settings follows the same general idea according to which all settings are loaded inside of RLSimion.

After the settings have been loaded and the graph consisting of custom object has been created, the CNTK graph will be created traversing the model. The required functions for this are stored inside the CNTKWrapper.h/CNTKWrapper.cpp file. This allows one to later add support for different frameworks, too, just by abstracting the code inside of this file to new frameworks.

Application within RLSimion

To use CNTK's neural networks in algorithms inside of RLSimion one has to use the parameter type NEURAL_NETWORK_PROBLEM_DESCRIPTION. This acts as the interface between the experiment's settings file and the C++ code inside RLSimion. The actual creation of the network is not performed immediatley after loading the settings, but is delayed until the network is used for the first time. You can access the network's CNetwork object by calling getNetwork(), which will create the neccessary CNTK tensors if this has not been done beforehand.

Implemented Deep Reinforcement Algorithms

At the moment there are some algorithms for discrete and for continous problems integrated. The algorithms for discrete actions also have support for different policies (epsilon-greedy, softmax, ...).

  • Discrete:

    • DQN
    • Asynchronous one-step Q-Learning (with only one thread at the moment)
  • Continuous:

    • DDPG
Clone this wiki locally