Off-policy training (e.g. TD3, SAC) with a SharedModel #347

jpmartin42 · 2025-08-05T03:18:45Z

jpmartin42
Aug 5, 2025

I'm working on a project that requires training on time-series data with a custom architecture, and it may be too expensive to train multiple networks to learn the same time-series relationships; instead, actors and critics might work better as separate heads off of a single shared backbone. As the observation space I'm looking at is large and both sample efficiency and exploration are important, I'm looking at the possibility of off-policy agents like SAC or TD3.

Has anyone used actor and critic heads on the same network with skrl for off-policy algorithms specifically? I have not been able to find any examples of this online (all SharedModel examples I've found on the website use PPO, which uses Single Forward-Pass instead of Multiple Forward-Pass), and figuring out how to handle shared architectures for these training pipelines with the skrl library has been confusing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Off-policy training (e.g. TD3, SAC) with a SharedModel #347

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Off-policy training (e.g. TD3, SAC) with a SharedModel #347

Uh oh!

jpmartin42 Aug 5, 2025

Replies: 0 comments

jpmartin42
Aug 5, 2025