PPO is on-policy, but its training used a replay buffer? #3382
Closed
houghtonweihu
started this conversation in
Community | General
Replies: 2 comments 1 reply
-
what is the brawlstar |
Beta Was this translation helpful? Give feedback.
0 replies
-
Hi, @houghtonweihu, in the PPO we still collect experience (PPO uses importance sampling) , we don't store it in a “replay buffer” as off-policy but just use the replay buffer class to hold the batch, and then clear the buffer after immediate use. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
PPO is on-policy, but the training of it in https://github.com/hpcaitech/ColossalAI/blob/main/applications/Chat/coati/trainer/ppo.py
used a replay buffer, which is only possible for off-policy.
Beta Was this translation helpful? Give feedback.
All reactions