About standardizing evaluation tests for llama.cpp #13884
daniel-dona
started this conversation in
Ideas
Replies: 1 comment
-
I started such a project but it's still very much WIP. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I was wondering if there is any project or somebody working on having a standard evaluation bench for llama.cpp, not in the sense of llama-bench (performance eval) but about regularly used benchmarks like AIME, LiveCodeBench, IFEval and so...
I'm currently working on some finetuning tests with models I later deploy with llama.cpp and I need to compare the base model with the finetunes to see if the models perform worse on different tasks than the one I'm training.
Maybe something similar to https://github.com/EleutherAI/lm-evaluation-harness but easier to deploy and/or more integrated with llama.cpp
Beta Was this translation helpful? Give feedback.
All reactions