Different judgment in reward server

Hi, thanks for you great work!

My question is about 

https://github.com/Zanette-Labs/efficient-reasoning/blob/d0173998e5edd7ea3672cc504034a7a8f32b7833/reward_server/math_server.py#L65

and

https://github.com/Zanette-Labs/efficient-reasoning/blob/d0173998e5edd7ea3672cc504034a7a8f32b7833/reward_server/math_server.py#L84

In first line, "response" is from "all_responses".
But in the second line, "response" is from "query".

If the response exceeds the maximum generation length, it will be truncated. 
In this way, all "response" in "all_responses" does not contain the "<｜end▁of▁sentence｜>". 
But second "response" will have "<｜end▁of▁sentence｜>". 

The results of the second judgment will be inconsistent.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Different judgment in reward server #6

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Different judgment in reward server #6

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions