Feature request: Evals #121

brooksc · 2025-10-19T15:54:41Z

brooksc
Oct 19, 2025

Is there any "evals" set of inputs and expected outputs that can be used to evaluate how good or bad a given model is?

I'm running local models and I'm using qwen3-vl-8b which was released recently. I'd love to see how this compares with the recommended qwen 2.5 -- or gemini.

If there was an eval suite available, this could be run whenever a new vision local model is released to determine whether to recommend it or not.

JerryZLiu · 2025-10-20T19:41:40Z

JerryZLiu
Oct 20, 2025
Maintainer

That would be really cool! I actually do have a private eval set that I use, though I'm reluctant to release it publicly as it's got a ton of personal info that I don't have time to redact. Please let me know if you see any qualitative improvements. I should probably move to recommend Qwen3VL-4B as the base.

For anyone who's curious, I've actually been investigating why Qwen2.5VL performs so poorly on LMStudio and Ollama. The base model is actually very powerful and accurate, but hallucinates a ton when run with LMStudio/Ollama.

lmstudio-ai/lmstudio-bug-tracker#1122 (comment)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature request: Evals #121

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Feature request: Evals #121

Uh oh!

brooksc Oct 19, 2025

Replies: 1 comment

Uh oh!

JerryZLiu Oct 20, 2025 Maintainer

brooksc
Oct 19, 2025

JerryZLiu
Oct 20, 2025
Maintainer