Negative reward when serving ArmoRM-Llama3-8B-v0.1

Hello! When I serve ArmoRM-Llama3-8B-v0.1 using OpenRLHF, the output rewards are almost negative (around -2.0). I've attached some pictures of how I served the reward model. Is the output of this RM naturally around -2.0, or is it because the way I serve the RM is wrong? (The prompt dataset are also from rlhflow, like "RLHFlow/iterative-prompt-v1-iter7-20K", and the responses are generated from "RLHFlow/LLaMA3-iterative-DPO-final". We also apply the chat template when creating the prompt-response dataset
<img width="711" alt="serve-armo-reward-model2" src="https://github.com/user-attachments/assets/7cb3b4ca-5734-4509-a14e-2a254a61b16a">
<img width="853" alt="serve-armo-reward-model" src="https://github.com/user-attachments/assets/74274f53-5849-4fe1-8744-e700aa6f5eb9">
)
<img width="719" alt="serve-armo-reward-model1" src="https://github.com/user-attachments/assets/4992e389-6170-4bdc-963b-1a87f077572d">


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Negative reward when serving ArmoRM-Llama3-8B-v0.1 #23

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Negative reward when serving ArmoRM-Llama3-8B-v0.1 #23

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions