How to generate outputs from the PPOTrainer of chatgpt? #2906
Unanswered
huliangbing
asked this question in
Community | Q&A
Replies: 2 comments 1 reply
-
Thanks for your feedback. We have already supported actor-inference in our newly updated PR. |
Beta Was this translation helpful? Give feedback.
1 reply
-
how to infer a RM(reward_model) like rm_checkpoint.pt? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
How to generate outputs from the PPOTrainer of chatgpt? Can we generate outputs from reward_model or initial_model?
Can you show me the code like this:
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
inputs = tokenizer(prompt, return_tensors="pt")
model = AutoModelForSeq2SeqLM.from_pretrained(checkpoint)
outputs = ### model.generate(**inputs, num_beams=5, num_beam_groups=5, max_new_tokens=30)
tokenizer.decode(outputs[0], skip_special_tokens=True)
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions