You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Misc. bug: The model's reasoning performance has significantly decreased despite using different versions of the same model architecture, identical parameters, and the same set of questions. #12816
Testing with Different Versions of the llama.cpp Server for the Same Inference Task
Using two versions of the llama.cpp server to address the same problem:
llama.cpp-b4756
llama.cpp-b4759
Both versions employ identical parameters and models, yet exhibit significant performance differences.
Key observations:
Performance degradation:
b4759 is noticeably less capable than b4756 (performing worse than twice as poorly in some cases).
Token consumption for the same task:
b4756: ~3,000 tokens
b4759: ~6,000 tokens
Version comparison:
b4702 (an older version) shows superior performance compared to b4756.
The test problem used:
Can you help me decrypt this cipher I received?
"K nkmg rncakpi hqqvdcnn."
This behavior is reproducible through multiple tests. After extensive testing, version b4759 was identified as the one with drastically degraded performance.
If you can reproduce similar findings, please share your test cases!