-
Notifications
You must be signed in to change notification settings - Fork 14
Open
Description
Hello,
I am currently unable to recreate the results of the stable transformer on the Pong environment. I believe from the paper the last 100 episode returns should be ~17.62 for this model and environment.
I am running the train program with arguments as specified in README for Best Performing Stable Transformer on Pong.
In train.py line 731 I changed ctx = mp.get_context("fork") to ctx = mp.get_context("spawn")
The final results I obtained one one run:
[INFO:17181 train:962 2020-12-01 19:35:33,350] Steps 10001513 @ 668.5 SPS. Loss -15.672254. Return per episode: -12.7. Stats:
{'baseline_loss': 11.395485877990723,
'entropy_loss': -18.699639002482098,
'episode_returns': [-20.0, -18.0, -19.0],
'last_100_episode_returns': -19.530000686645508,
'learning_rate': 8.657589688233862e-05,
'len_max_traj': 239,
'max_return_achieved': '-14.0 at step 5366379',
'mean_episode_return': -12.666666666666666,
'num_unpadded_steps': 3346,
'pg_loss': -8.368099212646484,
'total_loss': -15.672253926595053}
[INFO:17181 train:969 2020-12-01 19:35:33,350] Learning finished after 10001513 steps.
Results from another run:
[INFO:15271 train:962 2020-12-04 19:47:48,776] Steps 10001156 @ 661.4 SPS. Loss -9.595014. Return per episode: -19.7. Stats:
{'baseline_loss': 14.119840621948242,
'entropy_loss': -18.633128484090168,
'episode_returns': [-21.0, -19.0, -20.0, -19.0],
'last_100_episode_returns': -19.540000915527344,
'learning_rate': 9.02709105067138e-05,
'len_max_traj': 239,
'max_return_achieved': '-14.0 at step 7824133',
'mean_episode_return': -19.666666666666668,
'num_unpadded_steps': 3309,
'pg_loss': -5.081725597381592,
'total_loss': -9.595013936360678}
[INFO:15271 train:969 2020-12-04 19:47:48,776] Learning finished after 10001156 steps.
I am on Ubuntu 18.04.4, using Cuda 10.2, cudnn 7, torch 1.6.0.
Thanks in advance for any help.
Best,
Sean
Metadata
Metadata
Assignees
Labels
No labels