Skip to content

User Tutorial 1. PID controller on the rain car environment

BorjaFG edited this page Mar 17, 2019 · 12 revisions

Prerequisites

  • Download the latest release, and install the Herd Agent on your own machine (or any other on your same subnet).

Objectives

In this tutorial, we will configure and run a simple experiment. We will start by using a simple Proportional Integrative Derivative controller (no Reinforcement Learning yet, please be patient! ;) on a very simple problem we designed for testing purposes: the rain-car problem. In this tutorial, we will cover the most basic aspects of Badger:

  • Design an experiment design
  • Local experiment execution
  • Give a parameter multiple values
  • Remote executions
  • Result analysis.

In this control problem, we have to move a car along the x axis and make it reach the tree as fast as possible to avoid getting wet by the rain. Below, a screenshot of the visualization that represents the problem:

Rain-car problem

Tutorial

Basically, a PID controller is one of the most commonly used controllers that outputs an action (command) that depends on some error variable. In this environment, the error variable is position-deviation, which is the relative position of the car with respect to the goal position under the tree. In another words, this variable represents the error.

Design the experiment

The first tab we see when we start Badger (/bin/Badger.exe) is the Editor tab, which will show up empty.

Badger - Editor - empty

We will create a new experiment by clicking the New button (Badger new experiment button). This will create a new experiment configuration. All the parameters are shown in orange because they haven't yet been assigned a value and thus, each has its default value.

Badger - Editor - new

First, we will give the experiment a meaningful name in the header tab:

Experiment tab header

Configuring the parameters

Parameters are organized by object class. We will set the parameters of the main classes one by one.

Log

Here we configure what kind of episodes and how often they are logged. In this tutorial, we will only run one episode, so it will be enough to:

  • Num-Functions-Logged = 0. PID controllers do not learn any function, so there is no need to save any function.
  • Log-Freq = 0.25. We will leave the default value, which means that the system variables will be saved every 0.25 s.
  • Log-Eval-Episodes = true. The only episode in this experiment will be an evaluation of a controller, so we want evaluation episodes to be logged.
  • Log-Training-Episodes = false. No need to log training episodes because none will be run.
  • Log-Functions = false. No functions to be saved

World

These parameters determine the environment where the experiment will take place and the dynamic model's parameters:

  • Num-Integration-Steps = 4 and Delta-T = 0.01. The first parameter sets the number of integration steps between control steps. We will leave the default value. The second parameter determines the time (s) between control steps. Together, they mean that the agent will update the action executed every 0.01 s and the dynamical model will use 4 integration steps between every two control steps.
  • Dynamic-Model = Rain-car. Here we select the environment from a drop-down combo-box.

Experiment

The Experiment parameters control general parameters of the experiment:

  • Random-Seed = 1. Random number will be generated using seed 1. This parameter can be used to generate different runs of the same experiment but different sequences of random numbers.
  • Number-Episodes = 0. This means no training episode will be done, only an evaluation episode.
  • Eval-Freq = 1. This parameter determines how many training episodes are performed between evaluations. Since there is no training to be done, we can set this parameter to either 0 or 1, and the experiment will consist on only one evaluation episode.
  • Progress-Update-Freq = 1.0. When an experiment is executed remotely, this parameter sets how many seconds will elapse between progress updates. We will leave the default value, so that the progress is updated once per second.
  • Episode-Length = 60.0. The length in seconds of every episode. We will set it to half a minute.

SimGod

SimGod = the simulation god!

This object controls all the learning process and thus, holds global learning parameters. We will do no learning in this tutorial, so we will disable or ignore most of the parameters:

  • Freeze-Target-Function = false and Target-Function-Update-Freq = 100. These two parameters are used with algorithms that use two copies of the function being learned for improved stability: the online and the target function. For example, Double-QLearning uses this technique. Since we aren't using any such algorithm, it doesn't really matter the values given to these parameters.
  • Gamma = 0.9. This is a parameter that defines how much future rewards are valued to learn the value of the current action. This parameter is typically around 0.9 in the literature.
  • Use-Importance-Weights = false. This is an experimental parameter used only with off-line algorithms. We will leave the default value since it's going to be ignored by any regular controller any way.
  • State-Feature-Map = false and Action-Feature-Map = false. These two parameters determine how to map continuous states and actions to the weights of a function. No function is going to be learned so we will just disable them both.
  • Experience-Replay = false. This parameter is used to configure Experience Replay, which basically stores experience tuples <s,a,s_p,r> and uses them to train the agent with past experience between control steps. This breaks the temporal correlation and is expected to give a faster and more stable learning. For now, we disable it.

Simions

Simion = simulation minion (an agent). Yeah, we like silly metaphors and names!

  • Simion = Controller. We select a controller in the last drop-down list and new parameters appear.
  • Controller = PID.
  • Input-Variable = position-deviation. PID controllers output an action that is a function of an error state variable. In this problem, the error state variable is position-deviation.
  • Output-Action = acceleration. We link the output of the controller to the only action in this problem, the acceleration of the car.

The next three parameters are the Proportional gain (KP), Integrative gain (KI) and Derivative gain (KD):

  • KP: Schedule = Constant, Value = 1.0
  • KI: Schedule = Constant, Value = 0.0
  • KD: Schedule = Constant, Value = 0.0

Schedules can be used to apply time functions (decays are very typical in RL). We will use constant values and a simple proportional controller, so we can leave KI and KD null.

Running the experiment locally

Simply press the Play button on the header tab:

Experiment tab header

We will see a real-time visualization of the simulation.

Mountain-car real-time visualization

On the right, all the state variables and stats are shown as meters, and functions would be shown below if there was any to be learned. The visualization is real-time in the sense that what we see is the current state of the system. Beware, the simulation time may be faster than real time. The environment is drawn after simulating a control step. This means that, if simulating 0.01s and drawing a frame takes less than 0.01s, we will see the simulation go faster than real-time. This is by design. For true real-time visualizations, we must wait until the experiment is finished and run the off-line visualization.

In this first experiment, we will see the car go toward the tree, past it, and oscillate around it. Although this is beyond the scope of this first tutorial, the reason is that we are setting the acceleration as a function of the position.

The natural next step is to try giving KP different value to choose the most appropriate.

Giving parameters multiple values

The way to do this is by forking a parameter (creating several instances of the same parameter). We do this Right-clicking on the name of the Value parameter:

Fork a parameter

The parameter now will be forked:

Forked parameter

On the header, we can give this fork an alias (i.e., if we are to fork to parameters named Value we can given one alias KP-Value and the other KD-Value). We can also add and remove values (add fork value and remove fork value). We will add two values by clicking twice on the Add button. Now we can traverse these values with the left and right arrows and change the three values:

Traverse and edit fork values

We will use set three different values: 1.0, 0.5 and 0.1. Note that the number beside the name of the experiment on the header tab has been updated to (3). This means that this experiment will result in three experimental units, that is, three different combinations of parameters.

Remote execution

To execute all three experimental units concurrently (if possible, depending on the available hardware resources), we must press the Launch button on the top:

Launch button

A dialog window will ask where we want to store the experimental units and the results. This is stored in a batch file (.simion.batch), which references all the experimental units, each created in a separate folder inside the folder where the batch file is saved.

Badger will switch the view to the Monitor tab:

Badger - monitor i

On the left we see the list of available agents (local agents are deselected by default). In this case, we have in the same subnet two Herd-Agents (one running on a 4-core Linux-64 machine, the other on a Win-32 2-core machine).

We next press the Play button and see the evolution of the experiment:

Badger - monitor ii

This is a very short experiment with only one evolution episode. This means that:

  • The plot on the upper-right will not show anything because there is only one evaluation (it needs at least two evaluations).
  • The experiment finishes nearly instantly.

We can see the complete log files of each experimental unit if we press on one of the Flask icons (Flask icon).

Analysis of the results

We next press the Reports button (Reports button) and Badger will swicth to the Reports tab:

Badger reports

On the top left we see a list of the experiments (only one: Tutorial-1-PID-mountain-car) with the hierarchy of forks (only one: Value since we didn't give it an alias) and the values given to each forked parameter (three: 1.0, 0.5 and 0.1).

Below the experiment list, we can select a subset of the variables. We will select position and reward. In both cases, we will leave Last evaluation selected, since we are only interested in the last (and only) evaluation episode.

Now we will press the Generate reports button (Generate reports button) and Badger will generate two reports, one for the variable position:

Report - position

We see that the oscillations are bigger the lower the KP gain is. Below, some statistics are calculated for each experimental unit. Similarly, the other report shows the rewards obtained by the controller during the simulation:

Report - reward

Most interesting is that the look of the plots can be tuned by pressing Settings (Plot settings button) and then saved as a publication-ready .eps vectorial image by pressing Save (Plot save button).

This is the end of this tutorial. Next, we will start using RL algorithms and covering more advanced aspects of the framework.

Now, you can go to the next tutorial.

Clone this wiki locally