-
Notifications
You must be signed in to change notification settings - Fork 44
Open
Description
Description
First of all, thank you for your great work on this benchmark. I encountered some questions regarding the evaluation metrics in the repository.
Steps to Reproduce
- Review the definition of the RACE metric, which calculates:
res(target) / ( res(target) + res(reference) )
- Compare the expected value range with the values shown on:
- GitHub homepage
- Hugging Face leaderboard
- The paper
- Check the example results in:
Expected Behavior
- The metric result should normally fall between 0 and 1 based on its definition.
- The file results should be consistent with those reported on the GitHub homepage, Hugging Face leaderboard, and the paper.
Actual Behavior
- GitHub homepage, Hugging Face leaderboard, and the paper all show values ranging from 0 to 100.
- The race_result.txt file results cannot be matched with any of these sources.
Questions
- Are the reported values on the homepage/leaderboard/paper simply multiplied by 100 compared to the raw 0–1 metric values?
- Which set of results should be considered the correct reference: the ones in the repository file, the leaderboard, or the paper? Even after reviewing issue #7, I still cannot reach a clear conclusion.
Metadata
Metadata
Assignees
Labels
No labels