Skip to content

Clarification needed on RACE evaluation metric results #22

@alcholiclg

Description

@alcholiclg

Description

First of all, thank you for your great work on this benchmark. I encountered some questions regarding the evaluation metrics in the repository.

Steps to Reproduce

  1. Review the definition of the RACE metric, which calculates:
    res(target) / ( res(target) + res(reference) )
  2. Compare the expected value range with the values shown on:
  • GitHub homepage
  • Hugging Face leaderboard
  • The paper
  1. Check the example results in:

Expected Behavior

  • The metric result should normally fall between 0 and 1 based on its definition.
  • The file results should be consistent with those reported on the GitHub homepage, Hugging Face leaderboard, and the paper.

Actual Behavior

  • GitHub homepage, Hugging Face leaderboard, and the paper all show values ranging from 0 to 100.
  • The race_result.txt file results cannot be matched with any of these sources.

Questions

  • Are the reported values on the homepage/leaderboard/paper simply multiplied by 100 compared to the raw 0–1 metric values?
  • Which set of results should be considered the correct reference: the ones in the repository file, the leaderboard, or the paper? Even after reviewing issue #7, I still cannot reach a clear conclusion.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions