-
Notifications
You must be signed in to change notification settings - Fork 25
RLSimion
RLSimion is a C++ app that allows to use Reinforcement Learning algorithms on continuous control problems. It features several built-in simulated environments with their visualizations (here) and controllers/RL agents (here).
Usage: RLSimion.exe <configFile> [-pipe=<pipename>] [-requirements] [-local]
-
configFile: The name of the parameter configuration file (the extension should be
.simion.exp
, that is, an experimental unit with no forks). -
pipe: If we are monitoring the experiment, RLSimion will try to connect with the monitoring process via a pipe named
<pipename>
. - requirements: If set, instead of running the experiment, the process will output the requirements of the experiment via the standard output. This is used from Badger before sending experiments to remote machines to determine the required files.
- local: If set, a graphical window will be created to visualize the experiment in real-time.
The scheme below represents a simplified UML diagram of the app and the classes involved:
Reinforcement Learning (RL) agents learn from interaction with the environment (world), following this basic algorithm
foreach(episode)
reset world
foreach(step)
s= world.state()
a= agent.selectAction()
r= world.executeAction(a)
s_p= world.state()
agent.update(s,a,s_p,r)
An experiment consists of a series of episodes, which may be used for training or evaluation. Usually, agents for a number of episodes and then the policy learned so far is evaluated (no learning occurs during evaluation). Once the number of training episodes is set (may be 0 if we only want to evaluate an agent/controller), it must be decided how often evaluations will be done (after how many training episodes), and finally, how many episodes will be used for evaluation. Certain worlds may require more complex evaluations: for example, when evaluating a wind-turbine controller we may want to measure performance with different wind speeds. In such cases, the following method must be called from the constructor of the world to override the default number of episodes per evaluation (1):
SimionApp::get()->pEpisode->setNumEpisodesPerEvaluation(nEpisodesPerEvaluation);
Evaluation will always take place at the very beginning of the learning process, at the end (unless the number of training episodes is 0) and every n training episodes.
Consider the following example: the number of training episodes is 5, evaluation is done every 2 training episodes, and each evaluation consists of a single episode. The total number of episodes would be 5 + ((5/2)+1)*1= 8. The values returned by the following CExperiment methods would be:
On the other hand, if we use 2 episodes per evaluation, the number of episodes would be 5 + ((5/2)+1)*2= 11. The following values would be obtained:
Parameters that are to be set from GUI application (Badger) are embbeded within the source code. They get their value from an XML node (ConfigNode) on construction, and they are defined using of the following classes in parameters.h
:
- DOUBLE_PARAM : real number. For example:
m_startOffset = DOUBLE_PARAM(pConfigNode, "Start-Offset", "Normalized time from which the schedule will begin [0...1]", 0.0);`
The first parameter is the configuration node passed to the constructor (a node in the xml hierarchy), the second is the name given to the parameter (the one shown in Badger), the third is an explanation of the parameter to be shown as a tooltip, and the last one is the default value (the one given if this parameter is not present in the configuration file).
- INT_PARAM : integer number.
- BOOL_PARAM : boolean value.
- STRING_PARAM : a string.
- DIR_PATH_PARAM : a directory.
- FILE_PATH_PARAM : a file.
- ENUM_PARAM : a value from an enumerated type. The enumerated types (i.e, Interpolation) are declared in
parameters.h
, and then the variable of the enumerated type is declared in whichever class instantiates it using the template:
m_interpolation = ENUM_PARAM<Interpolation>(pConfigNode, "Interpolation", "Interpolation type", interpolation::linear);
- STATE_VARIABLE : a reference to a state variable. Using it instead of a string allows us to validate the variable names in Badger.
m_hVariable = STATE_VARIABLE(pConfigNode,"Variable", "The state variable");
- ACTION_VARIABLE : a reference to an action variable.
- CHILD_OBJECT : an instance of any class that can be created using a regular constructor with only one argument of type ConfigNode:
m_pVFunction = CHILD_OBJECT<LinearStateVFA>(pConfigNode, "VFunction", "The Value-function");
- CHILD_OBJECT_FACTORY : an instance of a class that inherits from a base class implementing a getInstance() function of the form:
static std::shared_ptr<T> getInstance(ConfigNode* pConfigNode);
These objects are instantiated:
m_pAlphaV = CHILD_OBJECT_FACTORY <NumericValue>(pConfigNode, "Alpha-v", "Learning gain used by the critic");
- MULTI_VALUE / MULTI_VALUE_FACTORY / MULTI_VALUE_SIMPLE_PARAM : These three types of parameters allow us to instantiate a set of objects of a given class.
- CHOICE : this is a utility object type that is meant to simplify the parsing of the getInstance() function required to use ..._FACTORY objects. For a proper parsing, it has to have the form:
std::shared_ptr<Controller> Controller::getInstance(ConfigNode* pConfigNode)
{
return CHOICE<Controller>(pConfigNode, "Controller", "The specific controller to be used",
{
{"PID",CHOICE_ELEMENT_NEW<PIDController>},
{"LQR",CHOICE_ELEMENT_NEW<LQRController>},
{"Jonkman",CHOICE_ELEMENT_NEW<WindTurbineJonkmanController>},
{"Vidal",CHOICE_ELEMENT_NEW<WindTurbineVidalController>},
{"Boukhezzar",CHOICE_ELEMENT_NEW<WindTurbineBoukhezzarController>},
{"Extended-Vidal",CHOICE_ELEMENT_NEW<ExtendedWindTurbineVidalController>},`
{"Extended-Boukhezzar",CHOICE_ELEMENT_NEW<ExtendedWindTurbineBoukhezzarController>}
});
}
All this parameters are parsed by SimionSrcParser automatically after compiling the source code in Visual Studio (via a Custom Build Step).
We can tell the logger to save the value of a given variable (it will be saved each timestep) using:
template <typename T> void Logger::addVarToStats<T>(const char* key, const char* subkey, T& variable);
For example:
SimionApp::get()->pLogger->addVarToStats<double>("TD-error", "TD-error", m_td);
Of course, we are passing references to variables, so they need to be alive all the experiment (no local variables).