Running new episodes until the first action is required? #6

jacobtw2 · 2023-12-12T13:13:08Z

jacobtw2
Dec 12, 2023
Collaborator

An issue I have noticed when trying to work with ALpypeRL is that the model needs to be in a position to perform the first action immediately upon start up.

I have a model where I request an action each time a new order reaches a certain point in the flowchart logic and the action decides what to do with the order. Calling upon reset would cause an issue because there is no order that needs it yet

Is there a way we can tell the simulation to run until the first action before returning back to the python side when reset is called. i have tried calling runFast on startup but this did not work.

Unfortunately, it is not even possible to manually set the model to a state where there is an order to process because any item for inserting an agent into a flow chart block such as inject or take is an event. Even though it happens in 0 time, it still requires time to move forward which it won't do until after going to the python side

Does anyone have any ideas on how to resolve this issue?

MarcEscandell · 2023-12-13T00:44:03Z

MarcEscandell
Dec 13, 2023
Maintainer

Hei Jacob!

I think I get your point, but let me rewrite it here in my own words just in case:

ALPypeRL will call takeAction at time() == 0 always to kickstart the model, and hence, getObservation and getReward as well. What you want is the simulation to first just run (no action executed yet) and wait until requestAction() is called (which may happen at time() > 0 - later in the simulation). Please correct me if you don't mean exactly this.

If you do so, I think you are right. I have thought about this before. In the simulations/models I have build, the action was required at time() == 0, but I do understand this might not always be the case. I will add this new feature requirement to the backlog.

For now, the only recommendation I can give you to solve your issue is to add an if-statement in your takeAction, getObservation and getReward functions. For example:

takeAction:

if (time() == 0) {
  // Do nothing
} else {
  // The rest of your code here
  // [...]
}

getObservation:

if (time() == 0) {
  // Just return an empty observation
  return <an-empty-observation>;
}
return <the-observation>;

getReward:

if (time() == 0) {
  // No reward
  return 0;
}
return <the-reward>;

The downside of this approach is that the neural network will associate a reward of 0 for any random action at the beginning. Maybe another good recommendation is to add time() as an entry in your observation space. That way the NN can match that behaviour to the simulation startup only.

As I said, I'll consider adding this feature in future releases. I'll keep you up to date. Happy coding!

1 reply

jacobtw2 Dec 13, 2023
Collaborator Author

Hi Marc.

Thank you for the response. Yes that is what I meant. For now we have done something similar to your last suggestion and added an extra parameter which is a 0/1 boolean flag for if this is the first setup action and then after that it will run as normal

MarcEscandell · 2024-01-15T13:46:08Z

MarcEscandell
Jan 15, 2024
Maintainer

Hei @jacobtw2! Please check the latest alpyperl-1.0.0 version. I think I managed to fix this, but I need your input to validate it.

3 replies

jacobtw2 Jan 31, 2024
Collaborator Author

Hi @MarcEscandell . I did some testing of the new version and added some tracelns. Could see that the reset function doesn't return until after the first time requestAction is called (5 seconds into the model) which is as expected. However, it still calls getObservation after resetting and does not call it again when the action is requested.

This means that if the observation state changes between resetting and the first action being requested then it will not be correct

MarcEscandell Jan 31, 2024
Maintainer

Hei @jacobtw2! You are absolutely right!

getObservation() is still being called early at model launch (when objects in the AnyLogic model may have not been yet created, causing differences in size).
I am not yet sure how to fix this problem. The reason why this method is called very early is because it is executed when rllib is creating the neural network and it needs to define the input layer size. For that, it queries AnyLogic during initialization.

For now, to fix this issue, I am just adding an if-statement and hardcoding the size. For example:

// To avoid exception at model initialization
if (time() > 0) {
	List<Number> observation = new ArrayList<>();
	observation.addAll(car.getVisionReadings());
	observation.add(roundToDecimal(car.getLinearVelocity(), 2));
	observation.add(car.getTrackCompletion());
	observation.add(time());
	return observation.toArray(new Number[observation.size()]);
}
// Otherwise assume time=0 and return fix size
int size = (int) Math.sqrt(numSensors);
Integer[] emptyObs = new Integer[size * size + 3];
// Initialize all elements to zero
for (int i = 0; i < emptyObs.length; i++) {
    emptyObs[i] = 0;
}
return emptyObs;

I'll be thinking of a better way to fix this!

MarcEscandell Jan 31, 2024
Maintainer

Just realized that I already wrote a similar idea some time ago. Sorry for repeating myself!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Running new episodes until the first action is required? #6

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Running new episodes until the first action is required? #6

Uh oh!

jacobtw2 Dec 12, 2023 Collaborator

Replies: 2 comments · 4 replies

Uh oh!

MarcEscandell Dec 13, 2023 Maintainer

Uh oh!

jacobtw2 Dec 13, 2023 Collaborator Author

Uh oh!

MarcEscandell Jan 15, 2024 Maintainer

Uh oh!

Uh oh!

jacobtw2 Jan 31, 2024 Collaborator Author

Uh oh!

MarcEscandell Jan 31, 2024 Maintainer

Uh oh!

MarcEscandell Jan 31, 2024 Maintainer

jacobtw2
Dec 12, 2023
Collaborator

Replies: 2 comments 4 replies

MarcEscandell
Dec 13, 2023
Maintainer

jacobtw2 Dec 13, 2023
Collaborator Author

MarcEscandell
Jan 15, 2024
Maintainer

jacobtw2 Jan 31, 2024
Collaborator Author

MarcEscandell Jan 31, 2024
Maintainer

MarcEscandell Jan 31, 2024
Maintainer