Off-policy training (e.g. TD3, SAC) with a SharedModel #347
jpmartin42
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I'm working on a project that requires training on time-series data with a custom architecture, and it may be too expensive to train multiple networks to learn the same time-series relationships; instead, actors and critics might work better as separate heads off of a single shared backbone. As the observation space I'm looking at is large and both sample efficiency and exploration are important, I'm looking at the possibility of off-policy agents like SAC or TD3.
Has anyone used actor and critic heads on the same network with skrl for off-policy algorithms specifically? I have not been able to find any examples of this online (all SharedModel examples I've found on the website use PPO, which uses Single Forward-Pass instead of Multiple Forward-Pass), and figuring out how to handle shared architectures for these training pipelines with the skrl library has been confusing.
Beta Was this translation helpful? Give feedback.
All reactions