You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I’d like to discuss the RL part after the two-stage training. I noticed that you implemented latent generation by modifying the generate part in the Transformers. In VERL, I’m wondering how this is achieved — is the rollout generation set to HF Transformers? This approach seems to consume a lot of GPU memory.