Skip to content

Deep Reinforcement Learning using Microsoft CNTK

Roland Zimmermann edited this page Sep 21, 2017 · 8 revisions

Introduction to CNTK

Internal Structure

Architecture design

The neural networks consist of different settings: the general structure/architecture, the hyperparameters of specific layers and the optimizer to train the network. To set this things, the usage of APIs (e.g. in python) of special deep learning frameworks is often needed. To ease the creation of deep RL experiments in RLSimion the NNCreator project has been integrated into Badger and the RLSimion-Lib.

The integration in Badger offers an easy to understand user interface allowing (unexperienced) users to create network architectures easily. Nevertheless, there are almost no restrictions regarding the network architecture, so that is just as well possible to generate more complex models. Furthermore, no knowledge of specific frameworks like cntk, keras or tensorflow is required. To achieve this, a novel data format to store network architectures and hyperparameters has been developed. This allows serializing the model from inside C# into a XML representation which can then be included in to experiment settings, created by Badger.

Inside of the RLSimion-Lib these settings are then read and parsed inside the NNCreator project, which servers as a wrapper between RLSimion and the CNTK API.

Data format

The describe of a neural network and its design a summarized inside a Problem object:

This class defines the optimizer (`OptimizerSetting`), the `Inputs, the `Output` and the architecture (`NetworkArchitecture). This network architecture is organized in the following way: A sequence of consecutive operations (also known as layers) is called a `Chain`. The elements of each `Chain` (also known as layer) are called a `Link`. A `NetworkArchitecture` can consist out of multiple `Chains`, but has to contain at least one.

It is also possible to create non-sequential models like residual networks inside of this Chain design by linking and merging different chains inside a new Chain.

At the moment you can use theLinks shown below. As most of the names are self explanatory there is no further documentation about these available. All of them derive from the LinkBase class. This class provides a Collection<ParameterBase> called Parameters in which all parameters of each Link are stored.

UML

These parameters

In total you arrive at this UML diagram.

C++ library

Application within RLSimion

Clone this wiki locally