-
Notifications
You must be signed in to change notification settings - Fork 25
Simion log files
Each RLSimion experiment outputs two different files: the XML descriptor and the binary file with the actual logged data.
The descriptor is a very basic XML file that shows which state/action variables, rewards and statistics have been saved in the log file:
<ExperimentLogDescriptor BinaryDataFile="experiment-log.bin">
<Episode-Type Id="0">Evaluation</Episode-Type>
<State-variable>v-setpoint</State-variable>
<State-variable>v</State-variable>
<State-variable>v-deviation</State-variable>
<Action-variable>u-thrust</Action-variable>
<Reward-variable>r/(v-deviation)</Reward-variable>
<Stat-variable>Features/Actor/E-Traces</Stat-variable>
<Stat-variable>Features/Critic/E-Traces</Stat-variable>
<Stat-variable>TD-error/TD-error</Stat-variable>
</ExperimentLogDescriptor>
Except the episode type node, which states which types of episodes have been saved, all the rest are the names of variables in the binary log file. Their values are saved every step.
The actual data logged is stored using a binary file. First, the experiment's header is saved:
struct ExperimentHeader
{
__int64 magicNumber = EXPERIMENT_HEADER;
__int64 fileVersion = CLogger::BIN_FILE_VERSION;
__int64 numEpisodes;
__int64 padding[HEADER_MAX_SIZE - 3]; //extra space
};
Then, for each episode (1..numEpisodes),
struct EpisodeHeader
{
__int64 magicNumber = EPISODE_HEADER;
__int64 episodeType;
__int64 episodeIndex;
__int64 numVariablesLogged;
__int64 episodeSubIndex;
__int64 padding[HEADER_MAX_SIZE - 5];
};
The episode type can be 0 (evaluation) or 1 (training). Episode indices are 1-based (1..numEpisodes). The number of variables logged is needed to read each step.
For every episode, we will have the following structure:
struct StepHeader
{
__int64 magicNumber = STEP_HEADER;
__int64 stepIndex;
double experimentRealTime;
double episodeSimTime;
double episodeRealTime;
__int64 padding[HEADER_MAX_SIZE - 5];
}
Generally there will be an array double[numVariablesLogged] following the StepHeader, each of the doubles corresponding to one of the variables listed in the descriptor (same order).
In case the episode has reached a terminal state, the value of the magicNumber will be EPISODE_END_HEADER, and there will be no more data in the log for this episode.
The step index is 1-based: [1..numSteps]
Three different times are logged:
- experimentRealTime: the time in seconds elapsed since the experiment started
- episodeSimTime: the simulated time in seconds elapsed since the episode started
- episodeRealTime: the time in seconds elapsed since the experiment started