Skip to content

Significantly different results with different backends #2851

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wenhuach21 opened this issue Mar 27, 2025 · 2 comments
Open

Significantly different results with different backends #2851

wenhuach21 opened this issue Mar 27, 2025 · 2 comments

Comments

@wenhuach21
Copy link

wenhuach21 commented Mar 27, 2025

For the model kaitchup/Qwen2.5-72B-Instruct-AutoRoundGPTQ-8bit, the leaderboard_ifeval results differ significantly between the HF backend and the vLLM backend. Could you provide insights into the possible reasons or help debug the issue? Thanks in advance!

HF backend

CUDA_VISIBLE_DEVICES=0,1 lm-eval --model hf --model_args pretrained=./,parallelize=True,dtype=float16 --tasks leaderboard_ifeval --batch_size 16 --limit 10

hf (pretrained=./,parallelize=True,dtype=float16), gen_kwargs: (None), limit: 10.0, num_fewshot: None, batch_size: 16

Tasks Version Filter n-shot Metric Value Stderr
leaderboard_ifeval 3 none 0 inst_level_loose_acc 0.8333 ± N/A
none 0 inst_level_strict_acc 0.7222 ± N/A
none 0 prompt_level_loose_acc 0.7000 ± 0.1528
none 0 prompt_level_strict_acc 0.5000 ± 0.1667

vllm backend

CUDA_VISIBLE_DEVICES=0,1 lm-eval  --model vllm --model_args pretrained=./,tensor_parallel_size=2,dtype=float16 --tasks leaderboard_ifeval --batch_size auto  --limit 10

vllm (pretrained=./,tensor_parallel_size=2,dtype=float16), gen_kwargs: (None), limit: 10.0, num_fewshot: None, batch_size: auto

limit 10

Tasks Version Filter n-shot Metric Value Stderr
leaderboard_ifeval 3 none 0 inst_level_loose_acc 0.2222 ± N/A
none 0 inst_level_strict_acc 0.2222 ± N/A
none 0 prompt_level_loose_acc 0.1000 ± 0.1000
none 0 prompt_level_strict_acc 0.1000 ± 0.1000
@kunxiongzhu
Copy link

For the mlc-llm and llama-cpp-python, I have the similar problem

@For-rest2005
Copy link

@kunxiongzhu For llama-cpp-python, the problem may come from abetlen/llama-cpp-python#1983.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants