Skip to content

Codes for Evaluating Generative Benchmarks #26

@kygguo

Description

@kygguo

Thanks for sharing this awesome repo!

The paper reports results on MMLU, GSM8K, HumanEval and BigBench-Hard. It seems this repo does not contain the codes for evaluating on these benchmark currently. Could you also share these codes? It would be great to follow the exactly same evaluation steps when comparing with other alignment methods.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions