-
Notifications
You must be signed in to change notification settings - Fork 25
User Tutorial 1. PID controller on the rain car environment
- Download the latest release, and install the Herd Agent on your own machine (or any other on your same subnet).
In this tutorial, we will configure and run a simple experiment. We will start by using a simple Proportional Integrative Derivative controller (no Reinforcement Learning yet, please be patient! ;) on a very simple problem we designed for testing purposes: the rain-car problem. In this tutorial, we will cover the most basic aspects of Badger:
- Design an experiment design
- Local experiment execution
- Give a parameter multiple values
- Remote executions
- Result analysis.
In this control problem, we have to move a car along the x axis and make it reach the tree as fast as possible to avoid getting wet by the rain. Below, a screenshot of the visualization that represents the problem:
Basically, a PID controller is one of the most commonly used controllers that outputs an action (command) that depends on some error variable. In this environment, the error variable is position-deviation
, which is the relative position of the car with respect to the goal position under the tree. In another words, this variable represents the error.
The first tab we see when we start Badger (/bin/Badger.exe
) is the Editor tab, which will show up empty.
We will create a new experiment by clicking the New
button (). This will create a new experiment configuration. All the parameters are shown in orange because they haven't yet been assigned a value and thus, each has its default value.
First, we will give the experiment a meaningful name in the header tab:
Parameters are organized by object class. We will set the parameters of the main classes one by one.
Here we configure what kind of episodes and how often they are logged. In this tutorial, we will only run one episode, so it will be enough to:
-
Num-Functions-Logged = 0
. PID controllers do not learn any function, so there is no need to save any function. -
Log-Freq = 0.25
. We will leave the default value, which means that the system variables will be saved every 0.25 s. -
Log-Eval-Episodes = true
. The only episode in this experiment will be an evaluation of a controller, so we want evaluation episodes to be logged. -
Log-Training-Episodes = false
. No need to log training episodes because none will be run. -
Log-Functions = false
. No functions to be saved
These parameters determine the environment where the experiment will take place and the dynamic model's parameters:
-
Num-Integration-Steps = 4
andDelta-T = 0.01
. The first parameter sets the number of integration steps between control steps. We will leave the default value. The second parameter determines the time (s) between control steps. Together, they mean that the agent will update the action executed every0.01 s
and the dynamical model will use4
integration steps between every two control steps. -
Dynamic-Model = Rain-car
. Here we select the environment from a drop-down combo-box.
The Experiment parameters control general parameters of the experiment:
-
Random-Seed = 1
. Random number will be generated using seed1
. This parameter can be used to generate different runs of the same experiment but different sequences of random numbers. -
Number-Episodes = 0
. This means no training episode will be done, only an evaluation episode. -
Eval-Freq = 1
. This parameter determines how many training episodes are performed between evaluations. Since there is no training to be done, we can set this parameter to either0
or1
, and the experiment will consist on only one evaluation episode. -
Progress-Update-Freq = 1.0
. When an experiment is executed remotely, this parameter sets how many seconds will elapse between progress updates. We will leave the default value, so that the progress is updated once per second. -
Episode-Length = 60.0
. The length in seconds of every episode. We will set it to half a minute.
SimGod = the simulation god!
This object controls all the learning process and thus, holds global learning parameters. We will do no learning in this tutorial, so we will disable or ignore most of the parameters:
-
Freeze-Target-Function = false
andTarget-Function-Update-Freq = 100
. These two parameters are used with algorithms that use two copies of the function being learned for improved stability: the online and the target function. For example, Double-QLearning uses this technique. Since we aren't using any such algorithm, it doesn't really matter the values given to these parameters. -
Gamma = 0.9
. This is a parameter that defines how much future rewards are valued to learn the value of the current action. This parameter is typically around0.9
in the literature. -
Use-Importance-Weights = false
. This is an experimental parameter used only with off-line algorithms. We will leave the default value since it's going to be ignored by any regular controller any way. -
State-Feature-Map = false
andAction-Feature-Map = false
. These two parameters determine how to map continuous states and actions to the weights of a function. No function is going to be learned so we will just disable them both. -
Experience-Replay = false
. This parameter is used to configure Experience Replay, which basically stores experience tuples<s,a,s_p,r>
and uses them to train the agent with past experience between control steps. This breaks the temporal correlation and is expected to give a faster and more stable learning. For now, we disable it.
Simion = simulation minion (an agent). Yeah, we like silly metaphors and names!
-
Simion = Controller
. We select a controller in the last drop-down list and new parameters appear. -
Controller = PID
. -
Input-Variable = position-deviation
. PID controllers output an action that is a function of an error state variable. In this problem, the error state variable isposition-deviation
. -
Output-Action = acceleration
. We link the output of the controller to the only action in this problem, the acceleration of the car.
The next three parameters are the Proportional gain (KP), Integrative gain (KI) and Derivative gain (KD):
KP: Schedule = Constant, Value = 1.0
KI: Schedule = Constant, Value = 0.0
KD: Schedule = Constant, Value = 0.0
Schedules can be used to apply time functions (decays are very typical in RL). We will use constant values and a simple proportional controller, so we can leave KI and KD null.
Simply press the Play button on the header tab:
We will see a real-time visualization of the simulation.
On the right, all the state variables and stats are shown as meters, and functions would be shown below if there was any to be learned. The visualization is real-time in the sense that what we see is the current state of the system. Beware, the simulation time may be faster than real time. The environment is drawn after simulating a control step. This means that, if simulating 0.01s
and drawing a frame takes less than 0.01s
, we will see the simulation go faster than real-time. This is by design. For true real-time visualizations, we must wait until the experiment is finished and run the off-line visualization.
In this first experiment, we will see the car go toward the tree, past it, and oscillate around it. Although this is beyond the scope of this first tutorial, the reason is that we are setting the acceleration as a function of the position.
The natural next step is to try giving KP different value to choose the most appropriate.
The way to do this is by forking a parameter (creating several instances of the same parameter). We do this Right-clicking on the name of the Value parameter:
The parameter now will be forked:
On the header, we can give this fork an alias (i.e., if we are to fork to parameters named Value we can given one alias KP-Value and the other KD-Value). We can also add and remove values ( and
). We will add two values by clicking twice on the Add button. Now we can traverse these values with the left and right arrows and change the three values:
We will use set three different values: 1.0
, 0.5
and 0.1
. Note that the number beside the name of the experiment on the header tab has been updated to (3)
. This means that this experiment will result in three experimental units, that is, three different combinations of parameters.
To execute all three experimental units concurrently (if possible, depending on the available hardware resources), we must press the Launch button on the top:
A dialog window will ask where we want to store the experimental units and the results. This is stored in a batch file (.simion.batch), which references all the experimental units, each created in a separate folder inside the folder where the batch file is saved.
Badger will switch the view to the Monitor tab:
On the left we see the list of available agents (local agents are deselected by default). In this case, we have in the same subnet two Herd-Agents (one running on a 4-core Linux-64 machine, the other on a Win-32 2-core machine).
We next press the Play button and see the evolution of the experiment:
This is a very short experiment with only one evolution episode. This means that:
- The plot on the upper-right will not show anything because there is only one evaluation (it needs at least two evaluations).
- The experiment finishes nearly instantly.
We can see the complete log files of each experimental unit if we press on one of the Flask icons ().
We next press the Reports button () and Badger will swicth to the Reports tab:
On the top left we see a list of the experiments (only one: Tutorial-1-PID-mountain-car) with the hierarchy of forks (only one: Value since we didn't give it an alias) and the values given to each forked parameter (three: 1.0
, 0.5
and 0.1
).
Below the experiment list, we can select a subset of the variables. We will select position and reward. In both cases, we will leave Last evaluation selected, since we are only interested in the last (and only) evaluation episode.
Now we will press the Generate reports button () and Badger will generate two reports, one for the variable position:
We see that the oscillations are bigger the lower the KP gain is. Below, some statistics are calculated for each experimental unit. Similarly, the other report shows the rewards obtained by the controller during the simulation:
Most interesting is that the look of the plots can be tuned by pressing Settings () and then saved as a publication-ready .eps vectorial image by pressing Save (
).
This is the end of this tutorial. Next, we will start using RL algorithms and covering more advanced aspects of the framework.
Now, you can go to the next tutorial.