model benchmarks for various sampling strategies? #8038

okuvshynov · 2024-06-20T16:09:51Z

okuvshynov
Jun 20, 2024

Let's say you pick a model (llama3-8b) and run evaluation of benchmarks typically reported by model creators (MMLU, HumanEval, etc.) with different sampling strategies (for example, do beam search with widths w1, w2, w3, ..) - how different would evaluation look like?

Are there any existing results?
Thank you.

ggerganov · 2024-06-20T16:31:14Z

ggerganov
Jun 20, 2024
Maintainer

AFAIK benchmarks do not use sampling. Not sure what is HumanEval, but MMLU does not involve sampling

1 reply

okuvshynov Jul 10, 2024
Author

Thank you! Sorry i missed your reply

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

model benchmarks for various sampling strategies? #8038

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

model benchmarks for various sampling strategies? #8038

Uh oh!

okuvshynov Jun 20, 2024

Replies: 1 comment · 1 reply

Uh oh!

ggerganov Jun 20, 2024 Maintainer

Uh oh!

okuvshynov Jul 10, 2024 Author

okuvshynov
Jun 20, 2024

Replies: 1 comment 1 reply

ggerganov
Jun 20, 2024
Maintainer

okuvshynov Jul 10, 2024
Author