Skip to content

Deep Reinforcement Learning using Microsoft CNTK

Roland Zimmermann edited this page Sep 21, 2017 · 8 revisions

Introduction to CNTK

Internal Structure

Architecture design

The neural networks consist of different settings: the general structure/architecture, the hyperparameters of specific layers and the optimizer to train the network. To set this things, the usage of APIs (e.g. in python) of special deep learning frameworks is often needed. To ease the creation of deep RL experiments in RLSimion the NNCreator project has been integrated into Badger and the RLSimion-Lib.

The integration in Badger offers an easy to understand user interface allowing (unexperienced) users to create network architectures easily. Nevertheless, there are almost no restrictions regarding the network architecture, so that is just as well possible to generate more complex models. Furthermore, no knowledge of specific frameworks like cntk, keras or tensorflow is required. To achieve this, a novel data format to store network architectures and hyperparameters has been developed. This allows serializing the model from inside C# into a XML representation which can then be included in to experiment settings, created by Badger.

Inside of the RLSimion-Lib these settings are then read and parsed inside the NNCreator project, which servers as a wrapper between RLSimion and the CNTK API.

Data format

The describe of a neural network and its design a summarized inside a Problem object:

UML

This class defines the optimizer (OptimizerSetting), the Inputs, the Output and the architecture (NetworkArchitecture). This network architecture is organized in the following way: A sequence of consecutive operations (also known as layers) is called a Chain. The elements of each Chain (also known as layer) are called a Link. A NetworkArchitecture can consist out of multiple Chains, but has to contain at least one.

It is also possible to create non-sequential models like residual networks inside of this Chain design by linking and merging different chains inside a new Chain.

In total you arrive at this UML diagram:

UML

C++ library

Application within RLSimion

Clone this wiki locally