model benchmarks for various sampling strategies? #8038
Unanswered
okuvshynov
asked this question in
Q&A
Replies: 1 comment 1 reply
-
AFAIK benchmarks do not use sampling. Not sure what is HumanEval, but MMLU does not involve sampling |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Let's say you pick a model (llama3-8b) and run evaluation of benchmarks typically reported by model creators (MMLU, HumanEval, etc.) with different sampling strategies (for example, do beam search with widths w1, w2, w3, ..) - how different would evaluation look like?
Are there any existing results?
Thank you.
Beta Was this translation helpful? Give feedback.
All reactions