-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Question on the removal of .step()
and .generate()
methods in Trainer
#3270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi, thanks for the feedback. |
Hello, thank you for the answer. I believe I have a similar question. In the earlier version (trl==0.11), there was a PPOTrainer.step(query, response, score) method that was really handy for online/iterative RL scenario. From what I see in the current implementation, it looks like everything is now wrapped into PPOTrainer.train. I’m wondering, what is the recommended way to implement an iterative scenario with the new version? |
Hey thank you for your answer. |
Ah ok you're talking about |
Okay, cristal clear, I'll let you know ! Thx for your answers 😊 |
Hello everyone,
First of all, thank you for the amazing work you're doing—this library is truly a fantastic resource for the community! 😊
I had a more practical question regarding the decision to remove the
.step()
and.generate()
methods from theTrainer
classes. Our current implementation relies heavily on these methods, and updating to the new version would require a fair amount of engineering effort.Before proceeding, I wanted to understand the rationale behind this change. Was there a specific motivation for deprecating these methods, and is there a recommended alternative approach?
Thank you in advance for your help and insights! 🚀
The text was updated successfully, but these errors were encountered: