Expected speed up over HuggingFace #662

deepakdalakoti · 2023-08-03T06:08:02Z

deepakdalakoti
Aug 3, 2023

Hi,

As per the vllm homepage the results show a huge speed up over huggingface API. However, my benchmarking results show only a modest improvement over Huggingface (~15% ). I am using Llama-2-7b-hf for testing. I used the code and prompt described here for testing. Is anyone seeing similar results? Are there some settings which should improve the results significantly? I am using the following libraries

CUDA: 11.8
vllm: 0.1.3

deelipvenkat · 2023-08-04T05:30:58Z

deelipvenkat
Aug 4, 2023

same here , not seeing much improvement in latency , although batch inferencing is much faster with vllm compared huggingface.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Expected speed up over HuggingFace #662

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Uh oh!

Expected speed up over HuggingFace #662

Uh oh!

deepakdalakoti Aug 3, 2023

Replies: 1 comment

Uh oh!

Uh oh!

deelipvenkat Aug 4, 2023

deepakdalakoti
Aug 3, 2023

deelipvenkat
Aug 4, 2023