About standardizing evaluation tests for llama.cpp #13884

daniel-dona · 2025-05-29T10:05:19Z

daniel-dona
May 29, 2025

I was wondering if there is any project or somebody working on having a standard evaluation bench for llama.cpp, not in the sense of llama-bench (performance eval) but about regularly used benchmarks like AIME, LiveCodeBench, IFEval and so...

I'm currently working on some finetuning tests with models I later deploy with llama.cpp and I need to compare the base model with the finetunes to see if the models perform worse on different tasks than the one I'm training.

Maybe something similar to https://github.com/EleutherAI/lm-evaluation-harness but easier to deploy and/or more integrated with llama.cpp

JohannesGaessler · 2025-05-29T13:44:30Z

JohannesGaessler
May 29, 2025
Collaborator

I started such a project but it's still very much WIP.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

About standardizing evaluation tests for llama.cpp #13884

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

About standardizing evaluation tests for llama.cpp #13884

Uh oh!

daniel-dona May 29, 2025

Replies: 1 comment

Uh oh!

JohannesGaessler May 29, 2025 Collaborator

daniel-dona
May 29, 2025

JohannesGaessler
May 29, 2025
Collaborator