Latent generation in VERL RL stage

Hi,
Thanks for the cool work!

I’d like to discuss the RL part after the two-stage training. I noticed that you implemented latent generation by modifying the generate part in the Transformers. In VERL, I’m wondering how this is achieved — is the rollout generation set to HF Transformers? This approach seems to consume a lot of GPU memory.

Looking forward to your reply.